Monthly Archives: July 2011

Debugging Techniques for optimized PPC builds


[This was originaly published on #AltDevBlogADay. Go there if you want to read a lot of awesome stuff from awesome dudes …]

In the last years I have given up the usage of debug builds completely. The performance was usually so bad, that it induced physical pain to play the game. Also the build and especially link-times for a debug build are just annoying on large projects. And not ignore the fact, that QA was testing the optimized builds, so remote-debugging or debugging of crash-dumps had to be done in this build either way.

But it is not that bad as some people might think. In the beginning it takes some time to get used to it, but after a few sessions, this works as good as a debug build.

This article is mostly aimed at programmers not that familiar with the lower level concepts and should help them to get the most information without the need of reading assembly.

Problems of optimized builds

1. The source code does not represent exactly the instructions that are executed
2. You will have to search for most of the variables yourself, as the resolution that is done by the debugger is mostly wrong
3. You can find everything in memory you might possibly need, you just need to find it

I will describe some techniques to get as much information about the current state as possible, without the need of reading assembly code.

Variables

First thing you need to realize is, that no local variables, parameters and return values can be watched and interpreted directly from source-code. If you hover over some variable or type them into the watch-window, you will get random information. There are of course some cases, where the value is correct, but this is nothing you should ever rely on.

The only trustworthy types of variables are global variables and static class members. These are always correct. If they contain garbage, than it is most probably, because they are screwed for real, were overwritten or not initialized at all.

Objects

The debugger can determine the “real” type of an object by resolving the vtable-entries, so use this to your advantage.

If you know, that there must be some kind of object at address 0xB00B5000, you can just
cast this address to any polymorphic type ( it doesn’t matter which one, it should just have a vtable). If you expand this object in the watch window, the first entry will hold the resolved vpointer and will contain a human readable name of the runtime type of this instance.

Here is an example. The address points to an instance of the class ‘UWorld’ and the debugger can determine this, no matter into which type you cast the pointer.

Register Usage

The PPC ABI defines a specified register usage. This allows you to get a lot of information just by looking at the registers. Note, that these are callstack dependent.

This means, a function-call overwrites some of the registers and restores them after returning. Therefore, you cannot rely on every register if you are not at the top of the callstack. But the debugger aids you here also. Every register that was invalidated by a function-call above in the callstack is displayed without a value in the register window.

In this picture you can see, that r0 and r3 - r12 were overwritten by another function-call. All registers that are containing values can be considered as valid.

The registers are used for clearly specified data.

r1         This is always the pointer to the current stack-frame.
r3  - r10  first 8 input arguments
r3  - r4   return values
r14 - r31  non-volatile registers

There are more register-types ( FPU registers, VMX registers ) but you should get away most of the time just with r0 - r31.

Address Ranges

Letโ€™s assume you are in some method-call and would like to inspect the current state of the this-pointer and itโ€™s members.

First, check r3, which usually contains the this-pointer. As this is the first parameter register, this makes kind of sense, right? If you have no valid r3, the first thing to do, is to search the r14 – r31 for sane object addresses.

What a sane address is, is completely platform and implementation dependent. The Xbox360 for example maps 64kb memory pages to the address-range 0x40000000 – 0x7fffffff. When you know the platform and the implementation internals of your memory allocator, you can easily find out which addressrange contains which data.

So, for the sake of an example, just assume you are debugging on a Xbox360 and your general purpose allocator uses 64kb memory pages internally.

Heap-Allocations will therefore almost always reside in a 0x4xxxxxxx address range. They could also go to 0x5xxxxxxx addresses, but only if you are using more than 256MB for your general purpose heap.

As the stack is also allocated from 64kb pages and grows downwards, you will find the stack in the 0x7xxxxxxx area.

Last but not least, the PE loader uploads code to the 0x80000000 – 0xA0000000 area.

So, now you already have a pretty clear picture of what is going on by just looking at the addresses.

0x4xxxxxxx - 0x5xxxxxxx	    heap objects
0x7xxxxxxx                  stack
0x8xxxxxxx - 0xAxxxxxxx     code

Normally, your allocator aligns the heap allocations to 8 or 16 byte boundaries. So, another criteria if you are looking for objects on the heap: Ignore unaligned addresses.

So, with this information in mind, let’s take a look on the register window from the last page.

You can clearly see, that r14 and r23 are most probably candidates for heap-allocated objects, while r13 points to the area where the code resides.

If you can expect that the heap-objects you are looking for are having a virtual function table, just cast the address from r14 and r23 to any polymorphic type. That’s what the debugger would show you:

Now, you can use these objects to find out further information about their state at the moment of the crash.

Stack

The same works for the stack-frame of course. You can open up a memory window and display the memory at r1. This gives you the data, that is stored on the stack.

If you work with the Memory-Window, make sure you change the view to “4-byte integer”
and “hexadecimal” display. Than you can just apply your knowledge of sane addresses, and look
there for helpful objects.

As you can see, there are some candidates in this stack-frame. Of course not every address that fits this pattern will contain a valid object, but most of the time you will find something that brings you a step further to the reason of your crash.

Conclusion

It is not really hard to get some decent information without a debug build at hand. These are just a collection of simple tricks to get some data without the need to read assembly. If you do not have problems with this, you will have a lot of easier and more reliable ways to get the information you need.

Stupid quoting is the root of all evil


If I would have received a beer for every time I hear (or read) someone quoting Knuth’s “Premature optimization is the root of all evil”, I would have long ago died with cirrhosis of the liver … twice.

Why does this quote drive me so mad?

First of all, no one I have ever met, who was using this quote to back up his stupid point has even read the paper this quote originates from.

Structured Programming with Goto Statements

Yeah, really, that was the title. And the entire article was about optimizing the shit out of stuff.

Why is no one quoting the title of this paper? Maybe I should do this whenever someone claims how evil ‘goto’ is.

Sorry, that you are not able to use the available tools without screwing your code base and falling into spaghetti-mode, moron! Did you ever heard a carpenter saying: “Dude, I don’t use saws. That is fricking dangerous. I could hurt myself.”?

Goto is as evil as virtual when you give it into the wrong hands.

But that is not the point … I’m getting sidetracked ๐Ÿ™‚

Never ignore performance considerations

To be clear: I know the importance of profiling to identify your bottlenecks and your critical path. I would never argue against that. Optimize only where your profiler tells you, it makes sense.

But that does not mean that you can give a crap about the rest of the code. Keep one thing in mind: There is no non-performance critical code in a game, ever. None. You don’t need to optimize the hell out of everything, but you need to think about performance implications of your code in every single case. There should never be an exception to this.

When you start to don’t care, Baby Jesus will hate you.

What this gives you in the end, is a bit too much cost for almost everything that is going on. You are wasting your time in trivial things all around your code base, but you are not able to nail it down and to optimize it properly, as it is spread everywhere. And every single optimization will give you almost non-measurable improvements. But the sheer amount of small inefficiencies sums up and costs you and considerable amount of execution time.

Unfortunately, when you are at this point, there is no chance of improving this ‘Death by a thousand papercuts’ situation anymore. You will not have the resources to spent precious programmer time on such minor improvements. It is just not enough bang for the buck.

Do not ignore performance considerations ever! This will bite you in the ass in the long run and you will have to suffer in other areas. In the worst case you will even be forced to scale down some features to meet the performance criteria. But for what?
Just for the fact, that you followed a totally outdated quote, that is used out of context and interpreted wrongly.

And stop quoting stuff you have no clue about.

Ironic as I am, I will finish this post with another quote from another awesome programmer ๐Ÿ™‚

"My point is, that you should fire anyone quoting anything from this paper without pointing out, that all this is obsolete, because compilers changed a lot since the age of dinosaurs ;-)"

Git Stuff

Since working at Nokia, I have the pleasure to work with a ‘Distributed Version Control System’. As I have used mostly perforce before, the switch was both, a blessing and a curse.

I have to admit, that I had massive problems to get used to it in the beginning. But by now, git and me are BFFs … at least until random shit starts to happen again :).
Yeah, I know that this is mine fault and not git’s. It is just so damn easy to do something wrong. Git is far away from a submit-and-run VCS like perforce, but that is a fair price for the fact, that you can now branch whenever you want without days of integration pain.

I do not want to go into to much details here, as there are more than enough very good tutorials out there. If you are new to DVCS, check out Joel’s brilliant article.

Here are some (hopefully) useful tips for working with git.

Git in Dropbox

That one is pretty obvious, but extremely useful, especially for private projects. You can push your local repo to your Dropbox and it automatically gets synced with all the PCs you are using Dropbox with.

# go to your Dropbox and create your project directory
$ cd ~/Dropbox
$ mkdir my_project
$ cd my_project

# now initialize your git repo with
$ git --bare init

# As you have your remote-repo prepared, go to your local repository.
$ cd ~/dev/my_project

# First, you need to introduce the remote location to git 
# this adds the specified path as the remote named 'origin'
# but you could as well name it 'Dropbox' or 'whatever'
$ git remote add origin file:///home/user/Dropbox/git/my_project

# git is set up, so push it to the remote ( 'origin' or whatever
# name you have used ). 
$ git push origin master

Done, you know have your repo on your Dropbox. If you are on another PC and want to
access it, just clone it from there, and you are set. You can use this like you would
use any git-server.

Save the history, with rebase

As your local repo is basically a branch of the remote repo, the default behavior of git pull is a merge. There is nothing really wrong about this, but if you work on larger projects with lots of contributors, this makes your history really hard to read.

You can avoid this quite easily by using rebase instead: git pull --rebase.
The main difference is the way the merge happens. With rebase, your commits are ‘removed’, the remote changes are applied and after that your changes are applied on top of the remote changes. This preserves a linear history and makes it human readable again.

Interactive Rebase FTW!

The interactive rebase allows you to modify already committed changes. Let’s say you are prototyping something. Instead of waiting for a good state to commit your changes, you can commit as often as you want. When you are ready to push, you can do the interactive rebase and put commits together, remove them completely or change the commit messages.
So, you have been prototyping a feature and realized that you need to refactor a bit of old code in this process. Let’s assume you have now 5 small checkins. 2 changes are small refactoring and the other 3 are iterations of the feature you are prototyping. You realize that it would make more sense to have only 2 commits. One for the refactoring, and one for your feature.

# you need to tell interactive rebase in which commits you are interested in 
# ( in our case these are the last 5 commits )
$ git rebase -i HEAD~5

This will put you into the rebase mode, where you can select what you want to do with these changes.

pick 5c6bb74 some refactoring
pick 91dbdfa other refactoring
pick 3080d61 iteration 1
pick 4e4f56a iteration 2
pick 1890f70 iteration 3

# Rebase a37f00c..1890f70 onto a37f00c
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

You can now alter the changes. In this case we want to group them and change their
commit messages. The result could look like this:

reword 5c6bb74 some refactoring          # changes the commit message
fixup 91dbdfa other refactoring          # groups this commit with the previous
reword 3080d61 iteration 1               # changes the commit message
fixup 4e4f56a iteration 2                # groups this commit with the previous
fixup 1890f70 iteration 3                # groups this commit with the previous

After you have done this, you will be prompted for the commit messages of the two rewords. When finished, you have only two commits left and they have the proper change description. You can now push this without having a bad conscience. This is how the history now looks like:

$ git log
commit 70f40f9504e5721c7bce32fe9a8c792cddce6acf
Author: Martin Zielinski 
Date:   Thu Jul 7 23:50:14 2011 +0200

    feature xyz

commit 4e47d572508b1109097f73959fe7be02e23ee437
Author: Martin Zielinski 
Date:   Thu Jul 7 23:49:22 2011 +0200

    refactoring old code

Hello world!


I hope that I find the time to blog about some technical stuff, especially programming and game programming related. Also topics covering optimization for game-consoles as well as mobile platforms might find their way onto this blog. And I will for sure also do what I am best in, complaining and ranting :).

But don’t expect too much, I’m not doing this either.