Powers of 2 Tutorial? Rant?
by David Stewart Zink · in Torque Game Engine · 02/11/2002 (9:12 am) · 10 replies
First, the code--
From platform/types.h:
From core/nTypes.cc:
Typical usage, core/bitStream.h:
So compare the code generated for isPow2(in_num) above with the canonical code, visible in millions of texts on the topic:
For the other functions there can be more debate as to appropriate code, but try this:
From platform/types.h:
inline bool isPow2(const U32 in_num)
{
return (in_num == getNextPow2(in_num));
}Assuming "isPow2" returns true if in_num is a power of two, then getNextPow2 must return the smallest power of two at least as large as its input.From core/nTypes.cc:
U32 getNextPow2(U32 io_num)
{
S32 oneCount = 0;
S32 shiftCount = -1;
while (io_num) {
if(io_num & 1)
oneCount++;
shiftCount++;
io_num >>= 1;
}
if(oneCount > 1)
shiftCount++;
return U32(1 << shiftCount);
}
//--------------------------------------
U32 getBinLog2(U32 io_num)
{
AssertFatal(io_num != 0 && isPow2(io_num) == true,
"Error, this only works on powers of 2 > 0");
S32 shiftCount = 0;
while (io_num) {
shiftCount++;
io_num >>= 1;
}
return U32(shiftCount - 1);
}These two functions should be inline. After being rewritten, of course.Typical usage, core/bitStream.h:
inline U32 BitStream::readRangedU32(U32 rangeStart, U32 rangeEnd)
{
AssertFatal(rangeEnd >= rangeStart, "error, end of range less than start");
U32 rangeSize = rangeEnd - rangeStart + 1;
U32 rangeBits = getBinLog2(getNextPow2(rangeSize));
U32 val = U32(readInt(S32(rangeBits)));
return val + rangeStart;
}Note that the usage here "getBinLog2(getNextPow2(X))" is dreadfully inefficient; what is actually needed is "getNextBinLog2(X)" or somesuch, which would just return shift-count from getNextPow2.So compare the code generated for isPow2(in_num) above with the canonical code, visible in millions of texts on the topic:
inline bool isPow2(U32 in_num)
{
return (in_num & (in_num - 1)) == 0;
}Much simpler, eh? Of course, it does return "true" for isPow2(0), but then the old version returns "0 == (1 << -1)" which is the same thing on many architectures.For the other functions there can be more debate as to appropriate code, but try this:
inline U32 getBinLog2(U32 io_num)
{
register U32 t_io_num = io_num;
register U32 shiftCount = 0;
while (t_io_num) { // There is an asm for this, if needed.
shiftCount++;
t_io_num >>= 1;
}
return isPow2(io_num) ? shiftCount - 1 : shiftCount;
}
inline U32 getNextPow2(U32 io_num)
{
return isPow2(io_num) ? io_num : (U32(1) << getBinLog2(io_num));
}
inline U32 BitStream::readRangedU32(U32 rangeStart, U32 rangeEnd)
{
AssertFatal(rangeEnd >= rangeStart, "error, end of range less than start");
U32 rangeSize = rangeEnd - rangeStart + 1;
U32 rangeBits = getBinLog2(rangeSize);
U32 val = U32(readInt(S32(rangeBits)));
return val + rangeStart;
}and presto the networking code becomes that little bit faster. I would guess that the getBinLog2(getNextLog2(X)) is called tens of thousands of times a second when in full networking mode. So this is probably a win.
#2
d
02/11/2002 (9:43 pm)
Make sure you check witk tim/rick and maybe get that checked into the main code. They'd know how useful it is, and whether it'd cause any issues.d
#3
02/20/2003 (12:27 pm)
hey! it had a birthday! Not in head, though.
#4
02/21/2003 (12:02 am)
hrm, i'm going to try this when I get to that stage in my game and see what difference it makes. It sure looks like it makes sense. I wonder if Tim/Rick saw this?
#5
I'm definitely going to merge these changes into TNL - and then later into Torque
12/17/2003 (4:33 pm)
Ok, I came up with an even faster getBinLog2!I'm definitely going to merge these changes into TNL - and then later into Torque
U32 getBinLog2(U32 number)
{
U32 result = 0;
if(number & 0xFFFF0000)
{
number >>= 16;
result += 16;
}
if(number & 0xFF00)
{
number >>= 8;
result += 8;
}
if(number & 0xF0)
{
number >>= 4;
result += 4;
}
if(number & 0xC)
{
number >>= 2;
result += 2;
}
if(number & 2)
{
number >>= 1;
result += 1;
}
return result;
}
#6
If you do getBinLog2() that way, you should code getNextPow2 to use it. Perhaps:
12/17/2003 (5:37 pm)
I've only had 2 hours sleep (RotK), so I may be silly, but you can probably drop the last "number >>= 1;".If you do getBinLog2() that way, you should code getNextPow2 to use it. Perhaps:
U32 getNextPow2(U32 number)
{
U32 result = ((U32)1) << getBinLog2(number);
return (result == number) ? result : (result << 1);
}
#7
here's how I ended up doing the implementation of the functions:
12/17/2003 (5:54 pm)
Good call!here's how I ended up doing the implementation of the functions:
U32 getBinLog2(U32 number)
{
U32 result = 0;
if(number & 0xFFFF0000)
{
number >>= 16;
result += 16;
}
if(number & 0xFF00)
{
number >>= 8;
result += 8;
}
if(number & 0xF0)
{
number >>= 4;
result += 4;
}
if(number & 0xC)
{
number >>= 2;
result += 2;
}
if(number & 2)
result ++;
return result;
}
U32 getNextBinLog2(U32 number)
{
return getBinLog2(number) + (isPow2(number) ? 0 : 1);
}
U32 getNextPow2(U32 number)
{
return isPow2(number) ? number : (1 << (getBinLog2(number) + 1));
}
#8
12/18/2003 (6:41 am)
Why write the function "getNextBinLog2"? I dont see it being used anywhere.
#9
this is from someone else in the thread listed above.
supposedly only takes 6 cycles!
12/18/2003 (7:53 am)
This is a REAL Bunch of geeks!this is from someone else in the thread listed above.
supposedly only takes 6 cycles!
inline uint32 getNextPowerOfTwo(uint32 value)
#pragma warning (disable:4035)
// don't whine about lacking return value (as it's returned in eax)
{
__asm
{
mov ecx,-1
mov eax,value
dec eax
bsr ecx,eax
mov eax,1
inc ecx
shl eax,cl
}
}
#10
nice to have that Bit Scan Reverse instruction (BSR) - not a very platform independent way to go though...
maybe do the float to int, normalize exponent trick (that was a good flipCode thread!):
12/18/2003 (10:27 am)
Hmm...nice to have that Bit Scan Reverse instruction (BSR) - not a very platform independent way to go though...
maybe do the float to int, normalize exponent trick (that was a good flipCode thread!):
inline U32 getBinLog2(U32 value)
{
F32 floatValue = F32(value);
return (*((U32 *) &floatValue) >> 23) - 127);
}
inline U32 getNextPow2(U32 value)
{
return isPow2(value) ? value : (1 << (getBinLog2(value) + 1));
}
Torque 3D Owner Pat Wilson
You're really diving through the bowles of all this stuff. I'm impressed.