Game Development Community

Optimization Tips

by Aditya Kulkarni · in Torque Game Builder · 07/02/2010 (6:41 am) · 6 replies

I know simple basics, like:
1) Use 'while' instead of 'for'
2) Use power of 2 size for an array
3) Switch cases instead of large number of if conditions
4) n*(n+1)/2 instead of a 'for' to calculate sum of 'n' numbers.

What techniques do you guys use to optimize your code?

#1
07/02/2010 (7:17 am)
I learned a good one in Torque Script recently.

From William Lee Simms:
Quote:
Copy this line into the console:

$a = 0; $b = 5; $start = getRealTime(); for( %i = 0; %i < 1000000; %i++ ) { if( $b $= "5" ) { $a++; } } echo( getRealTime() - $start );

The "$a" and "$b" are just busy-work. "$start" gets us the time we started the test. Then we loop a million times, doing a string compare. The final "echo" tells us the number of milliseconds of the test. For me, this was 453 milliseconds.

Now copy this into the console:

$a = 0; $b = 5; $start = getRealTime(); for( %i = 0; %i < 1000000; %i++ ) { if( $b == 5 ) { $a++; } } echo( getRealTime() - $start );

This is doing a numeric compare. For me, I got 297 milliseconds, nearly 33% less running time. Plus, the longer the string, the longer it takes:

$a = 0; $b = 123456789; $start = getRealTime(); for( %i = 0; %i < 1000000; %i++ ) { if( $b $= "123456789" ) { $a++; } } echo( getRealTime() - $start );

This took 688 milliseconds, where as changing the "$=" to "==" brings it back down to 297!

The short story (if that's possible at this point), is that numeric compares are a lot faster than string compares.
#2
07/02/2010 (1:18 pm)
Often people are simply trying to find a well-defined algorithm. If you can recognize this, then you can save a lot of coding and a lot of speed time by implementing an efficient algorithm.

For example, recently Patrick was struggling with the speed of a flood fill. Once I explained that he could use a well-known algorithm (in this case, depth-first using a stack), it became easier to implement and much faster.

The biggest problem, of course, is knowing that an algorithm already exists for such a problem. Wikipedia's list of algorithms is a good place to start, but it's still hard.
#3
07/02/2010 (1:25 pm)
If I'm manipulating a list of objects, it'll run a lot faster if you process the objects all at once instead of one at a time.

For example, in my current game, there are a flock of bats. Instead of updating each bat individually in its own "onUpdate", I'll instead process the entire list. To start, I set them all moving forward. Then I process the lead bat to move to a specific location. Then I process all of the other bats to angle towards the lead bat.

Another example seen in the forums is a gravity simulation. You'll find it runs a lot faster and with more stability if you process them all in a single loop rather than processing each individually.

You can either put all of the objects in a SimSet and put the SimSet on a schedule or you can create a ScriptObject and use that as the "processing object".
#4
07/02/2010 (6:02 pm)
This is pretty minor but I've noticed that TorqueScript multiplies faster than it divides as well. If you take the example of William's I posted earlier and change it to:
$a = 0; $b = 5; $start = getRealTime(); for( %i = 0; %i < 1000000; %i++ ) { if( $b / 2 == 5 ) { $a++; } } echo( getRealTime() - $start );
I get 204 but with:
$a = 0; $b = 5; $start = getRealTime(); for( %i = 0; %i < 1000000; %i++ ) { if( $b * 0.5 == 5 ) { $a++; } } echo( getRealTime() - $start );
I get 170.

15% faster though.
#5
07/02/2010 (8:36 pm)
The string comparison's a shock because I have used stricmp a lot (almost everywhere)...:)

In my current project, I needed to scan the tilemap to check if the tile was occupied. Instead of using getTileCustomData for every tile, I first populated an array with all the tile data at the start and used it for the comparisons instead. Smoother, better.
#6
07/09/2010 (9:13 am)
Hi guys,
aside from the good use numeric insted of string comparison performance tip, I think many generic script optimizations are very theorical: you should always check for the optimization that best fits your context.

Anyway, this is my 2 cent tip: avoid inner cicles. Inner cicles will take an extra wasted time to run each cicle, so even if you really optimize the business logic code lines inside them, they will reveal to be a lot of wasted work for your CPU.

This is an example:

function singleLoop( %outer )
{
	%b = 1;
	%start = getRealTime();
	for( %y = 0; %y < %outer; %y++ )
	{
		//business logic here
		%condition = (%b == 1);
	}
	return ( getRealTime() - %start );
}

function loop( %outer, %inner )
{
	%b = 1;
	%start = getRealTime();
	for( %y = 0; %y < %outer; %y++ )
	{
		for( %i = 0; %i < %inner; %i++ ) 
		{
			//business logic here
			%condition = (%b == 1);
		}
	}
	return ( getRealTime() - %start );
}

	%mid = ( loop(1000000, 1) + loop(1000000, 1) + loop(1000000, 1) + loop(1000000, 1) + loop(1000000, 1) ) / 5;
	echo("loop(1000000, 1) : " @ %mid SPC "ms");
	
	%mid = ( loop(100000, 10) + loop(100000, 10) + loop(100000, 10) + loop(100000, 10) + loop(100000, 10) ) / 5;
	echo("loop(100000, 10) : " @ %mid SPC "ms");
	
	%mid = ( loop(1000, 1000) + loop(1000, 1000) + loop(1000, 1000) + loop(1000, 1000) + loop(1000, 1000) ) / 5;
	echo("loop(1000, 1000) : " @ %mid SPC "ms");
	
	%mid = ( loop(100, 10000) + loop(100, 10000) + loop(100, 10000) + loop(100, 10000) + loop(100, 10000) ) / 5;
	echo("loop(100, 10000) : " @ %mid SPC "ms");
	
	%mid = ( loop(10, 100000) + loop(10, 100000) + loop(10, 100000) + loop(10, 100000) + loop(10, 100000) ) / 5;
	echo("loop(10, 100000) : " @ %mid SPC "ms");
	
	%mid = ( loop(1, 1000000) + loop(1, 1000000) + loop(1, 1000000) + loop(1, 1000000) + loop(1, 1000000) ) / 5;
	echo("loop(1, 1000000) : " @ %mid SPC "ms");
	
	%mid = ( singleLoop(1000000) + singleLoop(1000000) + singleLoop(1000000) + singleLoop(1000000) + singleLoop(1000000) ) / 5;
	echo("singleLoop(1000000) : " @ %mid SPC "ms");

Each call to loop function essentially runs 1Mln of comparison (%b == 1), exactly as singleLoop functions does, but check the performance difference when outer for cicles grow:

loop(1000000, 1) : 629.4 ms
loop(100000, 10) : 323.8 ms
loop(1000, 1000) : 293 ms
loop(100, 10000) : 293.6 ms
loop(10, 100000) : 286.8 ms
loop(1, 1000000) : 296.6 ms
singleLoop(1000000) : 290.2 ms

Please also note that you can't say singleLoop function is always faster then others functions, even if it really has less wasted for operations then all of others.