Game Development Community

dev|Pro Game Development Curriculum

Speed or size, that is the question?

by Jean-louis Amadi · 05/15/2012 (3:29 pm) · 9 comments

Well if we are talking "optimizations", we are talking about the spatio-temporal optimization. Yes, we have to reduce the size(spatio) occuped by our files on the disk and memory, and increase the speed of execution(temporal).

Also we have two main way for this:

1-Using the optimization options of the compiler and linker
2-Optimizing the code itself
--structure alignment
--intrinsics functions & calls
--asm injection
--Parallelization and Multithreading
--GPU computing
--Profiling
--the better code is the code that is never executed. :)

Ok, because i'm not a great specialist, i just make somewhat try with the options of the compiler on my I7, 8Go RAM,NVIDIA GTX 560 Ti.In practical, the results are usable only in my context but may be a first start point.

In first, i have used a global optimization with a favor for speed, in FPS Tutorial i get 75 frames/sec
img12.imageshack.us/img12/8332/optimspeed.pngIn second, i have used a global optimization with a favor for size, i get 75 frames/sec but the size of the *.dll has been reduced.

Size with favor for speed

img829.imageshack.us/img829/9559/sizewithglobalspeedfavo.png----

Size with favor for size

img854.imageshack.us/img854/2605/sizewithglobalsizefavor.png----
I have observed that my framerate was not impacted whatever the option choosen but the size of the *.dll has decrease with favor for the size option.
----
I have retry this test with a fully optimization for speed regardless the size(not global as previously)and nothing, no differences in the framerate.

Even after several tweaks of options, in my case, i can say that the only way for increase the speed is to modify the code and i can not counting on the compiler. I can use the compiler only for reduce, automagically and effortless the size of the out.

Well, i have keeping the options in favor of the size for Torque, i have modified the "qmake.conf"(because i use the *.dll of Qt framework), i have recompile Qt and i have used UPX after and below the result:

if you use Qt, in "Qt4.8.1/mkspecs/win32-msvc2010/qmake.conf"(change with your compiler version)

Doing the following changes:
// -O2 is for speed optim, -O1 is for size optim
QMAKE_CFLAGS_RELEASE    = -O1 -MD
QMAKE_CFLAGS_RELEASE_WITH_DEBUGINFO += -O1 -MD -Zi
QMAKE_LFLAGS_RELEASE    = /INCREMENTAL:NO /OPT:REF

The reduction of size is really good and the framerate after the launch of FPS Tutorial has no changed.

We pass from 55,6Mo for all the dll to 14.1Mo


But, one doubt stays in my mind, the space on the disk is reduced but how is the memory footprint when i use the compressed *dll? Anybody can respond, please?

In conclusion, in my head, on the optimizations, the better way for the speed is still in the code and its organization/syntax/architecture, but for the size, the compiler make its work.


Here a link to a fantastic resource for the optimizations in the code:

www.agner.org/optimize/

you can dowload the books, its free!


If you have the time or a more depth knowledge than me(it's not hard :) ) on this wonderful part of development, don't hesitate to add your experience, your correction, your advises or your prefered link.

Thank you all!







#1
05/16/2012 (1:06 am)
This is really interesting. One thing that made me wonder is how big my dlls would be when I compiled as release.

So, is upx a tool for reducing even more than compiling as release? If it is then this is really interesting as it will help with reducing download sizes for customers.
#2
05/16/2012 (1:40 am)
Yes, UPX reduces more than release but you can use the both (compiler && upx).
UPX can also compress the exe, i pass from 92Ko to 52Ko on the exe with UPX with no framerate alteration when i launch the demo with these "Quality Settings"
img607.imageshack.us/img607/6065/qualitysettings.png
Below the size comparison for the dll of Qt(only the compiler without UPX)
-O2 speed optim
img94.imageshack.us/img94/6347/sizeqtdllspeedoptim.png----------------------
-O1 size optim
img585.imageshack.us/img585/2910/sizeqtdllsizeoptim.png


#3
05/16/2012 (4:38 am)
I would be interested to see the startup time differences?
#4
05/16/2012 (9:29 am)
Quote:But, one doubt stays in my mind, the space on the disk is reduced but how is the memory footprint when i use the compressed *dll? Anybody can respond, please?
According to the UPX site:
Quote:no memory overhead for your compressed executables because of in-place decompression.
It is decompressed before use and UPX itself adds no memory overhead, so i would wager that (at least in theory) the memory footprint should remain unchanged
#5
05/16/2012 (12:40 pm)
This is really neat.

In an age where most game vendors think it is okay to have 1 to 2 to 5 to 10 GB sized games downloaded over the internet... This is a welcome addition to helping reduce that.
#6
05/17/2012 (12:33 am)
@Edward Smith:
I see not of differences with the startup time whatever the options and even with UPX but a real tool for pick the values of time and the memory footprint will be essential.

#7
05/18/2012 (10:57 am)
Compiler optimization flags are largely just a micro-optimization. For any major gains, you're going to need to improve the algorithms theirselves (see http://en.wikipedia.org/wiki/Big_O_notation). Any compiler setting can only reduce the constant, not the overall asymptotic complexity (it won't change an O(n^2) algorithm to an O(n) one for you).
Compare the exe with optimizations turned on with one with all optimizations turned off.

Also, be carefull with benchmarking in this manner. For a game, your frame rate is limited by whichever of your cpu/gpu is slower. If your cpu is already sitting idle waiting for the gpu, then optimizing the game/engine code isn't going to make any difference in frame rate. Likewise, if your gpu is sitting idle waiting for the cpu, then optimizing your shaders/graphics calls isn't going to make much of a difference either.

If you want a good cpu profiler, I like this one: http://www.codersnotes.com/sleepy. If you do find your time largely being spent in a handfull of functions, its generally better to find ways of reducing the number of times you call into those, than trying to micro-optimize the inner functions.

Make sure you have vsync turned off (or you'll never get a frame rate above your monitor's refresh rate). Then turn off everything that would be stressing the gpu without adding any cpu cost (turn off anti-aliasing, reduce your resolution, make the shaders as simple as possible, etc.). If you download Nvidia's PerfHud (http://developer.nvidia.com/nvidia-perfhud), it has options to disable/simplify all of your draw calls (drawing only a single triangle, disabling textures, that kind of thing), which you can use to try to reduce the gpu load of your game and see more what the cpu performance of it is like.

The memory footprint of the compressed dll should be essentially the same. As it needs to be uncompressed at some point for the cpu to actually run the code in it. If anything, you're adding to it, as the dll decompressor's code needs to be loaded and executed sometime.
#8
05/23/2012 (1:33 am)
@Thomas,

Thank you for your advise, i will try the cpu profiler, taking account the tips that you have given. I will search also more in depth in the algorithms of Torque and try to detect the O_notation for the more important. It will take me lot of time, i will post if i find some interesting things.

If the memory footprint of the compressed dll should be essentially the same then UPX becomes my favorite tool for reducing the size of the dll ;)
#9
05/23/2012 (3:16 am)
here is a recent comment on AMD's profiler
http://www.garagegames.com/community/forums/viewthread/130118/1#comment-826605