Category Archives: Programming

This code is a mess. Let’s start from scratch again …


[This was originaly published on #AltDevBlogADay.]

I have heard this sentence a lot of times. And I even said it myself more than once. It is pretty common that programmers want to have a clean and nice code base. They want to be able to understand what is happening at the first glance and they want to have the feeling that the code meets their quality expectations.

I also do.

But there is a serious problem which is often overlooked when we talk about ‘throwing away’ and starting from scratch.

Where is this messy code coming from?

Code does not get ‘messy’ and war-torn by itself. It is also – normally – not the fault of some stupid programmer who has no clue what he is doing. I admit, this might happen from time to time, but I have never worked with anybody who could be attributed like that.

This are the two main reasons for code to become ‘messy’.

Bugfixes and the handling of corner-cases

There are lots of issues that are found and fixed during the lifetime of a code base. All the small and big fixes sum up to code, that is not really what one would call ‘clean’. But this does not mean, that the code is bad. I would even say it is exactly the opposite of bad. The functionality is tested, proven and able to handle real-world data thrown at it. This ‘messy’ code is your safe haven and you can rely on it doing what you would expect.

Design/Focus changes

The code was written under different base assumption, which are not valid any more. Design or focus changes forced a strong shift and required the code to adapt in ways that were not really fitting its original design. This leads pretty fast to code, which is hard to understand as a whole and therefore hard to maintain. The additional complexity introduced by this can even spread into the toolchain, which makes the also the life of the user of the system miserable.

What to do with the mess?

The most important thing, is to realize, why the code is in the shape it is. It is crucial, that this is approached with the right mindset. You always should assume, that the implementation, the bugfixes and the extension of the code was done by someone, who had a clear picture of what is going on and a clear understanding of what needs to be done. It might sound obvious, but always assume the best knowledge and the best intent. Then you are able to judge the code objectively.

When you know in which state the code really is and when you understand all the interdependencies, you can make your decision on how to refactor it.

If there is no serious flaw in the design and it was not developed with different base assumptions and a different goal then what it has grown into, you should really think twice if you want to change it at all. Is there really a pressing reason to change it? Your decision should be not based on how much you like the code and how you judge the ‘elegance’ of the solution. The sole reason for the existence of the code is to deliver a specific functionality. And if this functionality is not suffering from the ‘ugliness’, don’t put it on stake. Accept the fact, that it might be not perfect, but it does the job. In the end, we are not writing code for the sake of writing code. We are building software. If the software functions properly, we did our job well. No one cares if there is code somewhere that does not adhere to the personal standards of a programmer, right?

Should you realize, that the code was developed with different requirements and was afterwards altered to somehow mirror the changes that happened to this requirements, the situation might be a bit more complicated. But even then, you need to keep in mind, that even this code is not necessarily shitty.

Whatever you think the right action is … throwing away the code is mostly the wrong one. We are always tempted to start from scratch, because we love to implement things and it is the most fun if you have a clear start. It is also by far easier to write new code, than to read old one.

But no matter how hard you try, you will be doomed to fix all the small bugs, issues and corner-cases later on again. All the things, that has been fixed for the existing code already need to be found again by the QA and fixed by you. There is no way you can fix all these issues on the fly, while re-implementing the functionality. Because of that, even a crappy implementation that was around for some time, has proven its right to exist and should therefore be refactored, rather than thrown away. You want to keep as much of the juice, that made the code do it’s job, as possible. And usually there is enough of it worth saving.

Conclusion

The last motorcycle I had, was over 15 years old. It had a lot of small quirks, but I knew every single one of those. I knew how she behaved in every situation. I knew how to handle her when riding in different weather conditions. I could do the service while being drunk … with closed eyes.

The same is valid for old, ‘messy’ code. It is not beautiful, and has its scratches and it’s quirks. But you know them, and you know how to use the code to get your job done. Everything you need to be able to do, has been done already. You can rely on it, to do what you expect.

Do not throw away this intimate relationship just because of aesthetic reasons. The new one will also have it’s issues and problems, but you first need to find all of them and learn how to handle them.

Memory allocation pitfalls on multi-core CPUs


[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes … ]

Although it is less and less common nowadays, there are still “Thread-Safe Memory Allocators” in use. What do I mean with this? A standard, single-core based allocator that uses a simple locking mechanism on top to avoid race-conditions.
I am usually a big fan of “The simplest solution”(tm), but this one unfortunately leads to two big problems on multi-core architectures and therefore doesn’t really qualify as a ‘solution’ at all.

Thread contention

I think it is pretty obvious that thread contention is bound to happen. When one thread is accessing the allocator ( allocating or releasing memory ) all other threads that are trying to do the same are blocked. It does not matter how fast the allocator is, as it will never be fast enough to not introduce contention and block other threads. This issue has an impact on performance especially in standard high-level gameplay code. As high-level gameplay code tend to use the allocator a lot ( creating/destroying objects, growing/shrinking dynamic arrays, etc. ) this is a recipe for just throwing away clock-cycles. For no gain at all. I am not talking about a few nano-seconds here as depending on the amount of runtime allocations, this can sum up faster than one might expect.

False Cache-Sharing

That is the more serious issue, and not that obvious to see. Two threads are working on data in a memory-area that is mapped to the same cache-line. This is not a theoretical problem, but a situation that is not that unlikely to happen. The probability of running into that increases with the amount of allocator contention. There is a good chance that a non-thread-aware allocator returns consecutive memory areas for consecutive allocations. If these allocation requests are coming from different threads, false cache-sharing is waiting to happen.

Example

Thread_A resides on CPU0
Thread_B on CPU1.

Both threads are doing totally unrelated calculations and both of them are allocating some memory.
Let’s assume both get a chunk of memory from the same cache-line.

This situation is called ‘false sharing’ or – what is even more fitting – ‘cache line ping-pong’. We have now created the biggest nightmare ( at least performance-wise ) for the cache-coherency protocol.

Thread_A writes to his memory.
– This invalidates Thread_B‘s cache-line.
– The cache of Thread_A must be written back to memory …
– … and read back again to the cache of Thread_B.

The same applies if Thread_B is modifying its memory area.

If you are interested in more details and also some performance impact measurements, check out ‘Analysis of False Cache Line Sharing Effects on Multicore CPUs’.

[Update]

As I was asked what I would propose as a solution on the comments section of ADBAD, here is my answer:

My preferred solution would be to disallow dynamic allocations at runtime completely, but that might be a bit drastic 🙂

So I rather go with this answer:

Instead of using a ‘thread-safe allocator’ which introduces the mentioned problems, the usage of a ( I like to call it ) ‘Thread-Aware Allocator’ should be the way to go.

Each thread gets his own big blob of memory and the management is done on a per-thread basis. This reduces the thread-contention to the situations where a new memory chunk is needed.
As every thread is allocating from his own memory-blob, the chances of false sharing due to the described reason are minimized.

One well documented example is the Intel TBB Scalable Allocator (TBB Scalable Allocator). ( It starts a few pages down … search for ‘SCALABLE MEMORY ALLOCATION’ ).
[/Update]

Further reading

[1] Analysis of False Cache Line Sharing Effects on Multicore CPUs
[2] Concurrency Hazards: False Sharing
[3] For more details on caches, read this excellent post by Luke Hutchinson.

Debugging Techniques for optimized PPC builds


[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes …]

In the last years I have given up the usage of debug builds completely. The performance was usually so bad, that it induced physical pain to play the game. Also the build and especially link-times for a debug build are just annoying on large projects. And not ignore the fact, that QA was testing the optimized builds, so remote-debugging or debugging of crash-dumps had to be done in this build either way.

But it is not that bad as some people might think. In the beginning it takes some time to get used to it, but after a few sessions, this works as good as a debug build.

This article is mostly aimed at programmers not that familiar with the lower level concepts and should help them to get the most information without the need of reading assembly.

Problems of optimized builds

1. The source code does not represent exactly the instructions that are executed
2. You will have to search for most of the variables yourself, as the resolution that is done by the debugger is mostly wrong
3. You can find everything in memory you might possibly need, you just need to find it

I will describe some techniques to get as much information about the current state as possible, without the need of reading assembly code.

Variables

First thing you need to realize is, that no local variables, parameters and return values can be watched and interpreted directly from source-code. If you hover over some variable or type them into the watch-window, you will get random information. There are of course some cases, where the value is correct, but this is nothing you should ever rely on.

The only trustworthy types of variables are global variables and static class members. These are always correct. If they contain garbage, than it is most probably, because they are screwed for real, were overwritten or not initialized at all.

Objects

The debugger can determine the “real” type of an object by resolving the vtable-entries, so use this to your advantage.

If you know, that there must be some kind of object at address 0xB00B5000, you can just
cast this address to any polymorphic type ( it doesn’t matter which one, it should just have a vtable). If you expand this object in the watch window, the first entry will hold the resolved vpointer and will contain a human readable name of the runtime type of this instance.

Here is an example. The address points to an instance of the class ‘UWorld’ and the debugger can determine this, no matter into which type you cast the pointer.

Register Usage

The PPC ABI defines a specified register usage. This allows you to get a lot of information just by looking at the registers. Note, that these are callstack dependent.

This means, a function-call overwrites some of the registers and restores them after returning. Therefore, you cannot rely on every register if you are not at the top of the callstack. But the debugger aids you here also. Every register that was invalidated by a function-call above in the callstack is displayed without a value in the register window.

In this picture you can see, that r0 and r3 - r12 were overwritten by another function-call. All registers that are containing values can be considered as valid.

The registers are used for clearly specified data.

r1         This is always the pointer to the current stack-frame.
r3  - r10  first 8 input arguments
r3  - r4   return values
r14 - r31  non-volatile registers

There are more register-types ( FPU registers, VMX registers ) but you should get away most of the time just with r0 - r31.

Address Ranges

Let’s assume you are in some method-call and would like to inspect the current state of the this-pointer and it’s members.

First, check r3, which usually contains the this-pointer. As this is the first parameter register, this makes kind of sense, right? If you have no valid r3, the first thing to do, is to search the r14 – r31 for sane object addresses.

What a sane address is, is completely platform and implementation dependent. The Xbox360 for example maps 64kb memory pages to the address-range 0x40000000 – 0x7fffffff. When you know the platform and the implementation internals of your memory allocator, you can easily find out which addressrange contains which data.

So, for the sake of an example, just assume you are debugging on a Xbox360 and your general purpose allocator uses 64kb memory pages internally.

Heap-Allocations will therefore almost always reside in a 0x4xxxxxxx address range. They could also go to 0x5xxxxxxx addresses, but only if you are using more than 256MB for your general purpose heap.

As the stack is also allocated from 64kb pages and grows downwards, you will find the stack in the 0x7xxxxxxx area.

Last but not least, the PE loader uploads code to the 0x80000000 – 0xA0000000 area.

So, now you already have a pretty clear picture of what is going on by just looking at the addresses.

0x4xxxxxxx - 0x5xxxxxxx	    heap objects
0x7xxxxxxx                  stack
0x8xxxxxxx - 0xAxxxxxxx     code

Normally, your allocator aligns the heap allocations to 8 or 16 byte boundaries. So, another criteria if you are looking for objects on the heap: Ignore unaligned addresses.

So, with this information in mind, let’s take a look on the register window from the last page.

You can clearly see, that r14 and r23 are most probably candidates for heap-allocated objects, while r13 points to the area where the code resides.

If you can expect that the heap-objects you are looking for are having a virtual function table, just cast the address from r14 and r23 to any polymorphic type. That’s what the debugger would show you:

Now, you can use these objects to find out further information about their state at the moment of the crash.

Stack

The same works for the stack-frame of course. You can open up a memory window and display the memory at r1. This gives you the data, that is stored on the stack.

If you work with the Memory-Window, make sure you change the view to “4-byte integer”
and “hexadecimal” display. Than you can just apply your knowledge of sane addresses, and look
there for helpful objects.

As you can see, there are some candidates in this stack-frame. Of course not every address that fits this pattern will contain a valid object, but most of the time you will find something that brings you a step further to the reason of your crash.

Conclusion

It is not really hard to get some decent information without a debug build at hand. These are just a collection of simple tricks to get some data without the need to read assembly. If you do not have problems with this, you will have a lot of easier and more reliable ways to get the information you need.