Game Development Community

Why use U32 when an S8 will do?

by Steven Peterson · in Torque Game Engine · 05/26/2008 (12:20 pm) · 14 replies

In Torque code it seems the de-facto default variable types are U32 or S32 and F32, even when much smaller variables will work just fine.

- Is there any reason not to use the smallest reasonable variable type?
- Is it not better to use 'unsigned' for a var that will always be >= 0?
- Would doing this yield a performance gain over making everything S32 or U32?


Thanks,
Steven

[edit]Meant to type "Why use S32 when an U8 will do?" in the subject, oops[/edit]

#1
05/26/2008 (4:06 pm)
Quote:
- Is there any reason not to use the smallest reasonable variable type?

Yeah. 32-bit is the native size of most computers today and if you use an 8-bit integer, the computer uses explicit mask operations to mask out the rest of the 24 bits. So your 8-bit integers will be as large as your 32-bit ones, and probably a little bit slower.

Unless the compiler can optimize this, it's a bad choice.

Quote:
- Is it not better to use 'unsigned' for a var that will always be >= 0?

Correct.

Quote:
- Would doing this yield a performance gain over making everything S32 or U32?

As far as I know, no.
#2
05/26/2008 (4:20 pm)
Note, however, that values sent over the networking system are packed as tightly as you care to do so. For example, if you are sending something with a range of 0..30 you can specify that range so it only takes up 5 bits. Floats can also be scaled.
#3
05/26/2008 (4:22 pm)
So basically the package will still travel, but more slowly, because it thinks it's an 8.
#4
05/26/2008 (4:28 pm)
Quote:
So basically the package will still travel, but more slowly, because it thinks it's an 8.

Huh? More slowly?

If you send 8 bits you send 8 bits. It doesn't matter if it's a U32 or a U8 that holds the value, the networking interface doesn't care about that. You feed it the bitrange, like Matthew described above.
#5
05/26/2008 (4:31 pm)
I meant to say that it will travel more slowly because it is in smaller U8 packages instead of 1 U32 package...
#6
05/26/2008 (5:04 pm)
It's not correct.

You're not sending data as types. You're sending the bits. The networking engine has, as I said before, no idea what the type is that it's transferring.

Look at this example:
U8 int01 = 1;
U32 int02 = 1;

bstream->writeInt (int01, 6);
bstream->writeInt (int02, 6);

Both of the stream->writeInt () calls above will send the exact same amount of data out on the wire. The only difference is that when your processor receives this call, it has to pad the U8 into U32 which takes more time, although it's nothing you'll notice by eye or even your profiler.
#7
05/27/2008 (10:19 am)
Just to reinforce what Stefan is saying, using the least amount of bits possible for the networking side of Torque makes things faster, (as well as more bandwidth efficient), not slower.
#8
05/27/2008 (11:54 am)
So wait a second... U8 makes it faster?
#9
05/27/2008 (12:06 pm)
No. Representation and storage of data in memory is independent of representation and storage of data across the network.
#10
05/28/2008 (12:16 pm)
Quote:
- Is there any reason not to use the smallest reasonable variable type?
- Is it not better to use 'unsigned' for a var that will always be >= 0?
- Would doing this yield a performance gain over making everything S32 or U32?

This is somewhat platform-dependent, but, using "int" or "unsigned int" when you don't care about size is almost always best.

You should use "unsigned int" when appropriate.

And, no, you're not likely to see a performance gain when using a type that is different from the native size of int just to save "bits" in those cases where you use a bound range. As was mentioned, it might actually be slower.

Also, there are very rarely opportunities for substantial performance gains changing things like this.

You'll get much bigger gains if you change things so a) you're doing less work than necessary, b) you're not doing work unless you absolutely have to (lazy evaluation), c), you don't redo work you've already done (caching), and/or d) you change algorithms.

There are some cases where hand-tuning code to match the hardware (either in assembly, or sometimes even in C), gives you some performance improvement, but these are usually in the much-less-than-100% range, vs. the many-times performance improvements (>> 100%) you can get by rethinking the over-arching approach and doing it differently.

Also, you might inadvertently introduce the possibility of bugs later on by limiting something to 8 bits or 16 bits and then wrapping it. This is less common nowadays, but in the days of 16-bit Windows (where 'int' was 16 bits), a very common source of bugs.
#11
05/29/2008 (9:51 am)
Quote:Get into a rut early: Do the same process the same way. Accumulate idioms. Standardize. The only idfference between Skahespear and you was the siaze of his idiom list - not the size of his vocabulary. - Alan Perlis
Thanks for the responses. That makes sense about using the native-sized int, glad I asked! I'm sure the difference is minute, but there's no reason not to being doing it the right way. :-)

The network discussion kinda threw me for a loop though. If i packUpdate() a U32 whose value happens to be 28 then the Torque Network code will only send the needed 5 bits to represent this, not the full 32 right?

Quote:But I also knew, and forgot, Hoare's dictum that premature optimization is the root of all evil in programming. -- Donald Knuth, (The Errors of Tex)
Tim -
Good point about algorithm analysis. I actually need to find a profile and learn how to use it. In my rendering algorithms, there's updates that I could easily do "every other frame" or "once a second" instead of every cycle. I don't know if it will make a difference though.
#12
05/29/2008 (10:32 am)
Quote:
If i packUpdate() a U32 whose value happens to be 28 then the Torque Network code will only send the needed 5 bits to represent this, not the full 32 right?

You don't packUpdate () a int, it's just a function where you can gather all your writes into the stream.
The receiving end can't know how many bits you send so you have to keep the same order of writes on the sender *and* receiver.

So packUpdate () and unpackUpdate () has to stay synced.

Ie, this is correct:
writeInt (integer, 16); // Write 16 bits.
integer = readInt (16); // Read 16 bits.

And this will result in an error:
writeInt (integer, 8); // Write 8 bits.
integer = readInt (12); // Read 12 bits, which is more than what we packed. Bad.

writeInt () takes two arguments, as you can see above.
#13
05/29/2008 (10:41 am)
To elaborate a bit on what Stefan just said,
for a U32, you should only write as many bits as you know are sufficient to express the maximum value of your U32.
ie, the number of bits should be the ceiling of the log-base-2 of the maximum value.
for signed integers it's a bit different.
there's similar support for floats and unit-length vectors as well.
#14
05/29/2008 (12:29 pm)
Ahh, i see.

For that code I just copied Melvyn May's original fxRenderObject example and it looks a little different from whats suggested here and elsewhere. I'll add that to my list of issues to revisit. Think his older way is a bit more generic - and less efficient.