[This was originaly published on #AltDevBlogADay.]
I have heard this sentence a lot of times. And I even said it myself more than once. It is pretty common that programmers want to have a clean and nice code base. They want to be able to understand what is happening at the first glance and they want to have the feeling that the code meets their quality expectations.
I also do.
But there is a serious problem which is often overlooked when we talk about ‘throwing away’ and starting from scratch.
Where is this messy code coming from?
Code does not get ‘messy’ and war-torn by itself. It is also – normally – not the fault of some stupid programmer who has no clue what he is doing. I admit, this might happen from time to time, but I have never worked with anybody who could be attributed like that.
This are the two main reasons for code to become ‘messy’.
Bugfixes and the handling of corner-cases
There are lots of issues that are found and fixed during the lifetime of a code base. All the small and big fixes sum up to code, that is not really what one would call ‘clean’. But this does not mean, that the code is bad. I would even say it is exactly the opposite of bad. The functionality is tested, proven and able to handle real-world data thrown at it. This ‘messy’ code is your safe haven and you can rely on it doing what you would expect.
The code was written under different base assumption, which are not valid any more. Design or focus changes forced a strong shift and required the code to adapt in ways that were not really fitting its original design. This leads pretty fast to code, which is hard to understand as a whole and therefore hard to maintain. The additional complexity introduced by this can even spread into the toolchain, which makes the also the life of the user of the system miserable.
What to do with the mess?
The most important thing, is to realize, why the code is in the shape it is. It is crucial, that this is approached with the right mindset. You always should assume, that the implementation, the bugfixes and the extension of the code was done by someone, who had a clear picture of what is going on and a clear understanding of what needs to be done. It might sound obvious, but always assume the best knowledge and the best intent. Then you are able to judge the code objectively.
When you know in which state the code really is and when you understand all the interdependencies, you can make your decision on how to refactor it.
If there is no serious flaw in the design and it was not developed with different base assumptions and a different goal then what it has grown into, you should really think twice if you want to change it at all. Is there really a pressing reason to change it? Your decision should be not based on how much you like the code and how you judge the ‘elegance’ of the solution. The sole reason for the existence of the code is to deliver a specific functionality. And if this functionality is not suffering from the ‘ugliness’, don’t put it on stake. Accept the fact, that it might be not perfect, but it does the job. In the end, we are not writing code for the sake of writing code. We are building software. If the software functions properly, we did our job well. No one cares if there is code somewhere that does not adhere to the personal standards of a programmer, right?
Should you realize, that the code was developed with different requirements and was afterwards altered to somehow mirror the changes that happened to this requirements, the situation might be a bit more complicated. But even then, you need to keep in mind, that even this code is not necessarily shitty.
Whatever you think the right action is … throwing away the code is mostly the wrong one. We are always tempted to start from scratch, because we love to implement things and it is the most fun if you have a clear start. It is also by far easier to write new code, than to read old one.
But no matter how hard you try, you will be doomed to fix all the small bugs, issues and corner-cases later on again. All the things, that has been fixed for the existing code already need to be found again by the QA and fixed by you. There is no way you can fix all these issues on the fly, while re-implementing the functionality. Because of that, even a crappy implementation that was around for some time, has proven its right to exist and should therefore be refactored, rather than thrown away. You want to keep as much of the juice, that made the code do it’s job, as possible. And usually there is enough of it worth saving.
The last motorcycle I had, was over 15 years old. It had a lot of small quirks, but I knew every single one of those. I knew how she behaved in every situation. I knew how to handle her when riding in different weather conditions. I could do the service while being drunk … with closed eyes.
The same is valid for old, ‘messy’ code. It is not beautiful, and has its scratches and it’s quirks. But you know them, and you know how to use the code to get your job done. Everything you need to be able to do, has been done already. You can rely on it, to do what you expect.
Do not throw away this intimate relationship just because of aesthetic reasons. The new one will also have it’s issues and problems, but you first need to find all of them and learn how to handle them.
[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes … ]
Although it is less and less common nowadays, there are still “Thread-Safe Memory Allocators” in use. What do I mean with this? A standard, single-core based allocator that uses a simple locking mechanism on top to avoid race-conditions.
I am usually a big fan of “The simplest solution”(tm), but this one unfortunately leads to two big problems on multi-core architectures and therefore doesn’t really qualify as a ‘solution’ at all.
I think it is pretty obvious that thread contention is bound to happen. When one thread is accessing the allocator ( allocating or releasing memory ) all other threads that are trying to do the same are blocked. It does not matter how fast the allocator is, as it will never be fast enough to not introduce contention and block other threads. This issue has an impact on performance especially in standard high-level gameplay code. As high-level gameplay code tend to use the allocator a lot ( creating/destroying objects, growing/shrinking dynamic arrays, etc. ) this is a recipe for just throwing away clock-cycles. For no gain at all. I am not talking about a few nano-seconds here as depending on the amount of runtime allocations, this can sum up faster than one might expect.
That is the more serious issue, and not that obvious to see. Two threads are working on data in a memory-area that is mapped to the same cache-line. This is not a theoretical problem, but a situation that is not that unlikely to happen. The probability of running into that increases with the amount of allocator contention. There is a good chance that a non-thread-aware allocator returns consecutive memory areas for consecutive allocations. If these allocation requests are coming from different threads, false cache-sharing is waiting to happen.
Thread_A resides on CPU0
Thread_B on CPU1.
Both threads are doing totally unrelated calculations and both of them are allocating some memory.
Let’s assume both get a chunk of memory from the same cache-line.
This situation is called ‘false sharing’ or – what is even more fitting – ‘cache line ping-pong’. We have now created the biggest nightmare ( at least performance-wise ) for the cache-coherency protocol.
– Thread_A writes to his memory.
– This invalidates Thread_B‘s cache-line.
– The cache of Thread_A must be written back to memory …
– … and read back again to the cache of Thread_B.
The same applies if Thread_B is modifying its memory area.
If you are interested in more details and also some performance impact measurements, check out ‘Analysis of False Cache Line Sharing Effects on Multicore CPUs’.
As I was asked what I would propose as a solution on the comments section of ADBAD, here is my answer:
My preferred solution would be to disallow dynamic allocations at runtime completely, but that might be a bit drastic 🙂
So I rather go with this answer:
Instead of using a ‘thread-safe allocator’ which introduces the mentioned problems, the usage of a ( I like to call it ) ‘Thread-Aware Allocator’ should be the way to go.
Each thread gets his own big blob of memory and the management is done on a per-thread basis. This reduces the thread-contention to the situations where a new memory chunk is needed.
As every thread is allocating from his own memory-blob, the chances of false sharing due to the described reason are minimized.
One well documented example is the Intel TBB Scalable Allocator (TBB Scalable Allocator). ( It starts a few pages down … search for ‘SCALABLE MEMORY ALLOCATION’ ).
 Analysis of False Cache Line Sharing Effects on Multicore CPUs
 Concurrency Hazards: False Sharing
 For more details on caches, read this excellent post by Luke Hutchinson.
[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes …
Check out the comments on this post on ADBAD, especially these from Christina Ann Coffin ( http://altdevblogaday.com/2011/08/24/who-killed-anti-portals/ ) ]
Yesterday, I had a small chat with a former coworker that threw me back in time. It was about Anti-Portals.
Yeah, I know, you heard that term back in the days, but as it is so long ago ….
… a small reminder what the hell an Anti-Portal is
An Anti-Portal is just a plane placed in the world, which shows you that everything behind this plane is not visible. To make use of it, you need to generate a plane that goes through the player’s point of view for every edge of the portal. You end up with a frustum that allows you to easily check if an object or scene-partitioning node is occluded or not. Normal Portals are working exactly the same, but they define the visible area instead of the occluded. They were used for doorways and such.
What the hell has happened to Anti-Portals?
After that chat yesterday, I realized for the first time, that this technique disappeared silently and is dead by now. I haven’t heard of anyone using anti-portals since around 2004 or something. Why? Although we have other ways to handle occlusion nowadays, I cannot see what a few well placed anti-portals could harm. But I can see that they could be perfectly used to reject a bunch of objects with almost no cost. This does of course only make sense for large occluders, but hey, why not? You will almost always be able to find some of these.
But now, let’s ask the most important question … who’s fault was it?
Who killed Portals / Anti-Portals? 🙂
If you have any clues that might help to find the murderer, please share them with us. Maybe we can catch that bastard before occlusion queries silently disappear.
Why the hell am I writing this?
First and foremost, to commemorate the fallen ones and to find that reckless criminal. And second, because this topic remembered me instantly of a long forgotten ( at least by me ) internet site. Flipcode!. Although the site is down since 2005, they still have the archives there and it is still a lot of fun to read that stuff again.
[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes …]
This post is not about the performance advantages of data oriented design, as this has been covered already pretty extensively by much smarter guys. ( see links below )
What I want to talk about, are the prejudices that I always here when people start to defend their holy objects.
Everyone and his mother is constantly reiterating the advantages of OOP – productivity increase, maintainability and code-reuseability. Should we really sacrifice all this good things just in favor of faster execution times?
What are the advantages of OOP everyone is so keen to protect?
I think this list should cover most of the claimed benefits:
Let’s go through this list and take a look what we really need to sacrifice.
Hiding of implementation specific data and functionality. This is something you should aim for in any programming paradigm, no matter if you call it OOP, DOD or ADHS. This is definitely nothing you should give up. And you don’t need to. To claim that encapsulation is only working with OOP is just plainly wrong, as this has been done in C since the dawn of time. So, apparently this still applies to DOD and is nothing you need to give up.
The ability to inherit a class’ data and functionality to extend/alter it’s behavior..
Inheriting data is really not fitting well into the DOD concept. It obfuscates the way your data is organized and forces you to group the data by object, not by usage pattern. This is definitely a thing you would need to give up, at least to some extent.
That is a really nice concept of OOP. It releases you from thinking about what happens, when you call a method on any given object. If you want to update your 10000 entities, you just iterate over a list and call update() each time. Everything is handled for you. Yeah, it is the most inefficient way you could handle this … but it works. It is convenient and it safes you a lot of headaches. Unfortunately, it is highly overused.
You do not really want that much of polymorphism if you design your code around your data. This does not mean there is no space left for it, but you have to decide whether is makes sense or not instead of using it as the default way of programming.
You put all data and functionality into one parent entity … a class for example. The ‘class’ in this case is an arbitrary defined scope for a ‘module’. A module could as well be a library, a subfolder in your source-tree, a matching pair of header and implementation-files or even a block of code marked by some fancy comment ( maybe even including ASCII-Art ). So, why is modularity attributed to OOP. You do not have to give up any modularity if going away from an OOP model. In the end, a module is defined by a description of the data and the transforms that can be applied to it. Modularity is also perfectly possible and encouraged in data oriented programming.
This should never be anything that drives your implementation decision. But apart from this, code-reuse is achieved by calling a function, right? Is it ‘better’ code-reuse if a class provides you the ’reusable’ functions to call? “But you can reuse entire objects, you $%*§$!”, I hear you saying. Can you? How often – in a real world application – have you reused an object to do something you haven’t had already in mind when writing this class originally. I think there aren’t that much occurrences, apart from the obvious cases. There is nothing that stops you from reusing your non-OOP code. Just because it is not modeled after an object does not mean that it cannot be used elsewhere. You can reuse as much code as it makes sense, so nothing to give up here.
What the hell is this supposed to mean? What is elegance in code? Is it achieved by modelling your code base after small chunks of code Erich G. & Friends have taught you?. Is it elegant, if you are layering one abstraction over another? The definition of elegant code is a bit subjective, so I find it hard to use this to support any programming model.
My personal definition: “Elegant code does the job it is supposed to do ( and nothing more ) in an efficient way and can be understood by you or some other coder 6 month later without jumping through 37 files.”
So, to reverse this argument – and to clarify how subjective ‘elegance’ is – object-oriented design encourages programmers to write non-elegant code according to my definition ( which for sure is the only correct one 😉 ).
I prefer to see extensibility of code pretty tight to my definition of elegance. If you are able to modify any given code and can understand it and the possible side effects, without the need to understand the 33 abstraction-layers underneath you, you have pretty extensible code.
The requirement is not to have a system in place that gives you the possibility to derive some classes and re-implement some functions. The requirement is, that you are able – in a short timeframe – to extend the code by the needed functionality without causing hazard some thousand lines away.
If you look back – at my highly biased post – there is not really a lot you are giving up in favor of faster execution times. But the most important advantage of data oriented design is the fact, that you are discouraged to over-engineer. You are not writing code for the sake of creating a code-temple for your ego, but you are writing code to perform an operation on your data-set. So, by starting being awesome and concentrating on your data, you are elevating your code to new levels of readability, maintainability and extensibility … and you get faster execution time as a nice side-effect. Do not forget the street cred you get from doing the right thing …
[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes …]
In the last years I have given up the usage of debug builds completely. The performance was usually so bad, that it induced physical pain to play the game. Also the build and especially link-times for a debug build are just annoying on large projects. And not ignore the fact, that QA was testing the optimized builds, so remote-debugging or debugging of crash-dumps had to be done in this build either way.
But it is not that bad as some people might think. In the beginning it takes some time to get used to it, but after a few sessions, this works as good as a debug build.
This article is mostly aimed at programmers not that familiar with the lower level concepts and should help them to get the most information without the need of reading assembly.
Problems of optimized builds
1. The source code does not represent exactly the instructions that are executed
2. You will have to search for most of the variables yourself, as the resolution that is done by the debugger is mostly wrong
3. You can find everything in memory you might possibly need, you just need to find it
I will describe some techniques to get as much information about the current state as possible, without the need of reading assembly code.
First thing you need to realize is, that no local variables, parameters and return values can be watched and interpreted directly from source-code. If you hover over some variable or type them into the watch-window, you will get random information. There are of course some cases, where the value is correct, but this is nothing you should ever rely on.
The only trustworthy types of variables are global variables and static class members. These are always correct. If they contain garbage, than it is most probably, because they are screwed for real, were overwritten or not initialized at all.
The debugger can determine the “real” type of an object by resolving the vtable-entries, so use this to your advantage.
If you know, that there must be some kind of object at address 0xB00B5000, you can just
cast this address to any polymorphic type ( it doesn’t matter which one, it should just have a vtable). If you expand this object in the watch window, the first entry will hold the resolved vpointer and will contain a human readable name of the runtime type of this instance.
Here is an example. The address points to an instance of the class ‘UWorld’ and the debugger can determine this, no matter into which type you cast the pointer.
The PPC ABI defines a specified register usage. This allows you to get a lot of information just by looking at the registers. Note, that these are callstack dependent.
This means, a function-call overwrites some of the registers and restores them after returning. Therefore, you cannot rely on every register if you are not at the top of the callstack. But the debugger aids you here also. Every register that was invalidated by a function-call above in the callstack is displayed without a value in the register window.
In this picture you can see, that
r3 - r12 were overwritten by another function-call. All registers that are containing values can be considered as valid.
The registers are used for clearly specified data.
r1 This is always the pointer to the current stack-frame. r3 - r10 first 8 input arguments r3 - r4 return values r14 - r31 non-volatile registers
There are more register-types ( FPU registers, VMX registers ) but you should get away most of the time just with
r0 - r31.
Let’s assume you are in some method-call and would like to inspect the current state of the this-pointer and it’s members.
r3, which usually contains the this-pointer. As this is the first parameter register, this makes kind of sense, right? If you have no valid
r3, the first thing to do, is to search the r14 – r31 for sane object addresses.
What a sane address is, is completely platform and implementation dependent. The Xbox360 for example maps 64kb memory pages to the address-range 0x40000000 – 0x7fffffff. When you know the platform and the implementation internals of your memory allocator, you can easily find out which addressrange contains which data.
So, for the sake of an example, just assume you are debugging on a Xbox360 and your general purpose allocator uses 64kb memory pages internally.
Heap-Allocations will therefore almost always reside in a 0x4xxxxxxx address range. They could also go to 0x5xxxxxxx addresses, but only if you are using more than 256MB for your general purpose heap.
As the stack is also allocated from 64kb pages and grows downwards, you will find the stack in the 0x7xxxxxxx area.
Last but not least, the PE loader uploads code to the 0x80000000 – 0xA0000000 area.
So, now you already have a pretty clear picture of what is going on by just looking at the addresses.
0x4xxxxxxx - 0x5xxxxxxx heap objects 0x7xxxxxxx stack 0x8xxxxxxx - 0xAxxxxxxx code
Normally, your allocator aligns the heap allocations to 8 or 16 byte boundaries. So, another criteria if you are looking for objects on the heap: Ignore unaligned addresses.
So, with this information in mind, let’s take a look on the register window from the last page.
You can clearly see, that
r23 are most probably candidates for heap-allocated objects, while
r13 points to the area where the code resides.
If you can expect that the heap-objects you are looking for are having a virtual function table, just cast the address from
r23 to any polymorphic type. That’s what the debugger would show you:
Now, you can use these objects to find out further information about their state at the moment of the crash.
The same works for the stack-frame of course. You can open up a memory window and display the memory at
r1. This gives you the data, that is stored on the stack.
If you work with the Memory-Window, make sure you change the view to “4-byte integer”
and “hexadecimal” display. Than you can just apply your knowledge of sane addresses, and look
there for helpful objects.
As you can see, there are some candidates in this stack-frame. Of course not every address that fits this pattern will contain a valid object, but most of the time you will find something that brings you a step further to the reason of your crash.
It is not really hard to get some decent information without a debug build at hand. These are just a collection of simple tricks to get some data without the need to read assembly. If you do not have problems with this, you will have a lot of easier and more reliable ways to get the information you need.
If I would have received a beer for every time I hear (or read) someone quoting Knuth’s “Premature optimization is the root of all evil”, I would have long ago died with cirrhosis of the liver … twice.
Why does this quote drive me so mad?
First of all, no one I have ever met, who was using this quote to back up his stupid point has even read the paper this quote originates from.
Structured Programming with Goto Statements
Yeah, really, that was the title. And the entire article was about optimizing the shit out of stuff.
Why is no one quoting the title of this paper? Maybe I should do this whenever someone claims how evil ‘goto’ is.
Sorry, that you are not able to use the available tools without screwing your code base and falling into spaghetti-mode, moron! Did you ever heard a carpenter saying: “Dude, I don’t use saws. That is fricking dangerous. I could hurt myself.”?
Goto is as evil as
virtual when you give it into the wrong hands.
But that is not the point … I’m getting sidetracked 🙂
Never ignore performance considerations
To be clear: I know the importance of profiling to identify your bottlenecks and your critical path. I would never argue against that. Optimize only where your profiler tells you, it makes sense.
But that does not mean that you can give a crap about the rest of the code. Keep one thing in mind: There is no non-performance critical code in a game, ever. None. You don’t need to optimize the hell out of everything, but you need to think about performance implications of your code in every single case. There should never be an exception to this.
When you start to don’t care, Baby Jesus will hate you.
What this gives you in the end, is a bit too much cost for almost everything that is going on. You are wasting your time in trivial things all around your code base, but you are not able to nail it down and to optimize it properly, as it is spread everywhere. And every single optimization will give you almost non-measurable improvements. But the sheer amount of small inefficiencies sums up and costs you and considerable amount of execution time.
Unfortunately, when you are at this point, there is no chance of improving this ‘Death by a thousand papercuts’ situation anymore. You will not have the resources to spent precious programmer time on such minor improvements. It is just not enough bang for the buck.
Do not ignore performance considerations ever! This will bite you in the ass in the long run and you will have to suffer in other areas. In the worst case you will even be forced to scale down some features to meet the performance criteria. But for what?
Just for the fact, that you followed a totally outdated quote, that is used out of context and interpreted wrongly.
And stop quoting stuff you have no clue about.
Ironic as I am, I will finish this post with another quote from another awesome programmer 🙂
"My point is, that you should fire anyone quoting anything from this paper without pointing out, that all this is obsolete, because compilers changed a lot since the age of dinosaurs ;-)"
Since working at Nokia, I have the pleasure to work with a ‘Distributed Version Control System’. As I have used mostly perforce before, the switch was both, a blessing and a curse.
I have to admit, that I had massive problems to get used to it in the beginning. But by now, git and me are BFFs … at least until random shit starts to happen again :).
Yeah, I know that this is mine fault and not git’s. It is just so damn easy to do something wrong. Git is far away from a submit-and-run VCS like perforce, but that is a fair price for the fact, that you can now branch whenever you want without days of integration pain.
I do not want to go into to much details here, as there are more than enough very good tutorials out there. If you are new to DVCS, check out Joel’s brilliant article.
Here are some (hopefully) useful tips for working with git.
Git in Dropbox
That one is pretty obvious, but extremely useful, especially for private projects. You can push your local repo to your Dropbox and it automatically gets synced with all the PCs you are using Dropbox with.
# go to your Dropbox and create your project directory
$ cd ~/Dropbox
$ mkdir my_project
$ cd my_project
# now initialize your git repo with
$ git --bare init
# As you have your remote-repo prepared, go to your local repository.
$ cd ~/dev/my_project
# First, you need to introduce the remote location to git
# this adds the specified path as the remote named 'origin'
# but you could as well name it 'Dropbox' or 'whatever'
$ git remote add origin file:///home/user/Dropbox/git/my_project
# git is set up, so push it to the remote ( 'origin' or whatever
# name you have used ).
$ git push origin master
Done, you know have your repo on your Dropbox. If you are on another PC and want to
access it, just clone it from there, and you are set. You can use this like you would
use any git-server.
Save the history, with rebase
As your local repo is basically a branch of the remote repo, the default behavior of
git pull is a merge. There is nothing really wrong about this, but if you work on larger projects with lots of contributors, this makes your history really hard to read.
You can avoid this quite easily by using rebase instead:
git pull --rebase.
The main difference is the way the merge happens. With rebase, your commits are ‘removed’, the remote changes are applied and after that your changes are applied on top of the remote changes. This preserves a linear history and makes it human readable again.
Interactive Rebase FTW!
The interactive rebase allows you to modify already committed changes. Let’s say you are prototyping something. Instead of waiting for a good state to commit your changes, you can commit as often as you want. When you are ready to push, you can do the interactive rebase and put commits together, remove them completely or change the commit messages.
So, you have been prototyping a feature and realized that you need to refactor a bit of old code in this process. Let’s assume you have now 5 small checkins. 2 changes are small refactoring and the other 3 are iterations of the feature you are prototyping. You realize that it would make more sense to have only 2 commits. One for the refactoring, and one for your feature.
# you need to tell interactive rebase in which commits you are interested in
# ( in our case these are the last 5 commits )
$ git rebase -i HEAD~5
This will put you into the rebase mode, where you can select what you want to do with these changes.
pick 5c6bb74 some refactoring
pick 91dbdfa other refactoring
pick 3080d61 iteration 1
pick 4e4f56a iteration 2
pick 1890f70 iteration 3
# Rebase a37f00c..1890f70 onto a37f00c
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
You can now alter the changes. In this case we want to group them and change their
commit messages. The result could look like this:
reword 5c6bb74 some refactoring # changes the commit message
fixup 91dbdfa other refactoring # groups this commit with the previous
reword 3080d61 iteration 1 # changes the commit message
fixup 4e4f56a iteration 2 # groups this commit with the previous
fixup 1890f70 iteration 3 # groups this commit with the previous
After you have done this, you will be prompted for the commit messages of the two rewords. When finished, you have only two commits left and they have the proper change description. You can now push this without having a bad conscience. This is how the history now looks like:
$ git log
Author: Martin Zielinski
Date: Thu Jul 7 23:50:14 2011 +0200
Author: Martin Zielinski
Date: Thu Jul 7 23:49:22 2011 +0200
refactoring old code
I hope that I find the time to blog about some technical stuff, especially programming and game programming related. Also topics covering optimization for game-consoles as well as mobile platforms might find their way onto this blog. And I will for sure also do what I am best in, complaining and ranting :).
But don’t expect too much, I’m not doing this either.