Many people are familiar with the “Show Framerate and Profile” option in the BGE and the mess of text it displays on their screen. However, not as many people truly know what the different statistics mean. This article aims to help improve people’s understanding of the profile stats and how to change your game to get those numbers down (less time spent is better for performance). Aside from the FPS, the profile shows nine stats: Physics, Logic, Animations (only in newer versions of Blender), Network, Scenegraph, Rasterizer, Services, Overhead, Outside, GPU Latency (only in newer versions of Blender). To get the most accurate readings, I recommend turning off “Use Frame Rate” and using your graphics card drivers (or the UI option in the render properties on newer versions of Blender) to force vsync off.
This represents the time spent on physics code. These days the BGE only uses Bullet for physics, so this stat mostly represents the time spent in Bullet. To reduce the time, you’ll need to simplify your physics so Bullet doesn’t have to do as much work. This can include using simpler physics shapes for objects. For example, if you have a complicated mesh for a character and you set the physics type to Convex Hull or Triangle Mesh (the default if no other bound type is explicitly set), Bullet has to do physics calculations with the complicated mesh, which is just a waste of time. Instead, try to see if something simpler like a sphere or box can do the trick. If not, at least setup a “proxy” by creating a simple version of your mesh that is invisible and is used for calculations instead of the complicated mesh that is used for rendering.
Time spent on logic is time that is spent on logic bricks and Python code (excluding code run through KX_Scene.pre_draw and KX_Scene.post_draw; those times show up under the Rasterizer). If you want to reduce this, you’ll need to simplify/optimize your logic bricks and Python code. I’m not going to give a tutorial on optimizing Python code, but this talk by Mike Fletcher (known for PyOpenGL) describes profiling Python code and some tips for optimizing. Remember, always profile your code before attempting to optimize it! As a last resort, you can also try moving some of your Python code to C/C++.
Under animations you have the time spent in Blender’s animation code, which the BGE makes use of. This includes things such as looking up pose data and interpolating key frames. However, be warned that sometimes things like calculating IK can show up under the scenegraph when calculating bone parents. Also, this category does not include the time spent to do the actual mesh deformation, this time is recorded under the Rasterizer category. To reduce the time spent on animation try to reduce the bone count in your armatures. You can also try switching your armatures over to iTaSC (set to simulation) for IK solving instead of the Legacy solver. iTaSC can be faster than the Legacy solver. In my tests I’ve seen 1.25~1.5x speed improvements when using iTaSC, but I’ve heard that 4x is not unreasonable.
This might come as a surprise to some, but the BGE actually has some networking code. However, this feature was never really developed, so now it is mostly a stub that can send messages over a loopback interface. This is how Message actuators and sensors (and the corresponding Python API features) work. It’s doubtful that this category will ever be a time sink, but if you’re having problems, take a look at the number of messages you’re sending and see if you can reduce them.
The scenegraph keeps track of objects’ position, orientation and scale (and probably a few other things I’m not thinking of at the moment). This also includes updating parent-child relation ships (e.g., bone parents). As mentioned earlier, the time for bone parents can include getting updated pose data, which possibly means calculating IK. If the scenegraph is really high, try reducing the number of objects in your scene. You can also try using iTaSC (mentioned under Animations). The scenegraph also handles culling (frustum and occlusion) calculations.
The rasterizer is responsible for actually rendering the game. This includes rendering geometry, shaders, and 2D filters.
Since the BGE makes use of double buffering, the rasterizer also has to swap the buffers, which can give really high readings if vsync is enabled (SwapBuffers() blocks while waiting for a screen refresh). This time is now represented in the GPU Latency category. To reduce the time spent in the rasterizer (or the GPU latency), you can try to simplify your geometry and materials. Also make sure you don’t have too many lights casting dynamic shadows. Each shadow cast requires the scene to be rendered. So, if you have three spot lights casting shadows, the scene is rendered four times (three for shadows and once for the actual scene)! 2D filters can also suck up some time, so even if that bloom, depth of field and SSAO look nice, you might want to consider removing them or trying to reduce the number of samples they use.
This is the time spent processing various system devices (keyboard, mouse, etc). You shouldn’t have a problem with this category taking up time.
This is probably one of the most mis-leading category names. The “overhead” is all the text drawn on top of the game screen in the top left corner. This includes the framerate, profile, and debug properties. So, the time spent on this category should be reclaimed when running your game in a more “release” configuration (i.e., you’re not drawing all that debug/profiling text to the screen). If you want to reduce the time spent here while profiling, try reducing the number of debug properties you display.
This is time spent outside of the BGE’s main loop. In other words, something is taking time away from the BGE. You really have no control over this area. If you have a lot of other programs running, you can try to close some.
This category is new to r59097, and will be in Blender 2.69. This category represents the time spent waiting on the GPU. This category used to be entirely within the Rasterizer category, so the same tips from there apply to this category. However, time spent waiting for vsync will show up here now instead of in the Rasterizer category. Also, this category is a bit different from other categories in that it is idle time (the CPU is just waiting on the GPU). This means this is time that can be used by the CPU (e.g., physics, animations, logic, etc.) without affecting the framerate. This also means that if the GPU Latency is high, trying to optimize CPU time is pointless as it will, also, not affect the framerate. If this value is low, it is still possible to be GPU bound. Various OpenGL calls (usually some form of glGet) can cause a sync event in which the CPU has to wait on the GPU. These sync events can cause odd profiler readings depending on which part of the codebase they occur in. For example, if overhead is suddenly taking up a large amount of time, odds are that the font rendering triggered a sync.
I hope people find this useful.