Floating point to integer conversion
by Brad Schick · 05/19/2001 (9:32 am) · 2 comments
/*******************************************************************************
During performance testing I noticed that almost %8 of my game's time was spent
in the _ftol function converting floating point numbers to integers. While my
game does an extreme amount of such conversions, this may effect other games
to a lesser degree. The inline ASM functions below are designed to speed up
various types of floating point to integral conversions.
When a floating point number is cast to an integer, ANSI C requires that the
integer be truncated. In MS VC this is done by the _ftol library function.
In order to truncate, the _ftol function is forced to change the Intel FPU
from the default rounding mode of "round to nearest" (RC field of 00B) to
the "round toward zero" mode (RC field of 11B) then back again. This FPU
mode switching is very slow.
The helper functions below use rounding conversions for specific purposes
instead of truncating conversions. This is much faster since the FPU's mode
does not need to be changed. Since these funcitons are inlined the
conversions are even faster because _ftol is a libary function call.
IMPORTANT NOTES:
1) These functions do not have the same behavior as standard arithmetic
rounding. 'fist' uses the default FPU rounding mode which rounds to the
nearest even number. So 0.5 rounds to 0.0 while 1.5 rounds to 2.0. This
may not always be desirable! The FPU could be set to "round up" mode
and back, but that would kill performance. Just add 0.5 and cast instead.
2) Even if "round to the nearest" is OK, rounding in general is not
always an acceptable float to integer conversion. Know your code before
using these functions.
3) This code has been compiled and tested using VC6. The ASSERT calls
are place holders and will not compile by default.
*******************************************************************************/
/*******************************************************************************
Rounding from a float to the nearest integer can be done several ways.
Calling the ANSI C floor() routine then casting to an int is very slow.
Manually adding 0.5 then casting to an int is also somewhat slow because
truncation of the float is slow on Intel FPUs. The fastest choice is to
use the FPU 'fistp' instruction which does the round and conversion
in one instruction (not sure how many clocks). This function is almost
10x faster than adding and casting.
Caller is expected to range check 'v' before attempting to round.
Valid range is INT_MIN to INT_MAX inclusive.
*******************************************************************************/
__forceinline int Round( double v )
{
ASSERT( v >= INT_MIN && v <= INT_MAX );
int result;
__asm
{
fld v ; Push 'v' into st(0) of FPU stack
fistp result ; Convert and store st(0) to integer and pop
}
return result;
}
/*******************************************************************************
Same behavior as Round, except that PRound returns and unsigned value
(and checks 'v' for being positive in debug mode). This method can
be used for better type safety if 'v' is known to be positive.
Caller is expected to range check 'v' before attempting to round.
Valid range is 0 to UINT_MAX inclusive.
*******************************************************************************/
__forceinline unsigned PRound( double v )
{
ASSERT( v >= 0 && v <= UINT_MAX );
unsigned result;
__asm
{
fld v ; Push 'v' into st(0) of FPU stack
fistp result ; Convert and store st(0) to integer and pop
}
return result;
}
// Ignore warning about not returning a value
#pragma warning( disable: 4035 )
/*******************************************************************************
To check if a double is actually an integral value you could
cast the double to an int (which used the slow ANSI C _ftol function)
then subtracted the int from the original double and tested for zero. This
is fairly slow. The code below produces the same result but it faster
because a rounding float to int conversion is used. I'm actually not
sure if the code below is optimal but I profiled several variations and
this was the fastest...
Returns true if 'v' is a valid integer in the range of INT_MIN to INT_MAX
inclusive and fills 'i' with the integer
Returns false if 'v' is not an integer.
There is no need to range check 'v' before calling IsInteger
*******************************************************************************/
__forceinline bool IsInteger( double v, int *i )
{
if( v < (double)INT_MIN || v > (double)INT_MAX )
return false;
// Using a local int to store conversions then reloading
// it is faster than doing multiple conversions.
int local;
__asm
{
fld v ; Push 'v' into st(0) of FPU stack
fist local ; Convert and store st(0) to integer
fild local ; Push integer to st(0)
fcompp ; Compare st(0) and st(1) then pop twice
fnstsw ax ; Moves FPU code flags to AX (AH) register
test ah,40h ; Test if AH is 40h (meaning st(0) == st(1) )
je SetF ; Jump to SetF if test was false
mov edx,i ; Move local to *i
mov eax,local
mov [edx],eax
mov al,1 ; Set return value to true
jmp Bye ; Jump to exit
SetF:
xor al,al ; Set return value to false
Bye:
}
}
/*******************************************************************************
Identical to IsInteger but checks 'v' for a more restrictive range.
Returns true if 'v' is an intergal value and between 0 and USHRT_MAX inclusive.
Unlike Round this method can be called with a negative value.
Returns true if 'v' is a valid integer in the range of 0 to USHRT_MAX
inclusive and fills 'i' with the integer
Returns false if 'v' is not an index.
There is no need to range check 'v' before calling IsIndex
*******************************************************************************/
__forceinline bool IsIndex( double v, int *i )
{
// Change the max value to suite your needs.
if( v < 0.0 || v > (double)USHRT_MAX )
return false;
// Using a local int to store conversions then reloading
// it is faster than doing multiple conversions.
int local;
__asm
{
fld v ; Push 'v' into st(0) of FPU stack
fist local ; Convert and store st(0) to integer
fild local ; Push integer to st(0)
fcompp ; Compare st(0) and st(1) then pop twice
fnstsw ax ; Moves FPU code flags to AX (AH) register
test ah,40h ; Test if AH is 40h (meaning st(0) == st(1) )
je SetF ; Jump to SetF if test was false
mov edx,i ; Move local to *i
mov eax,local
mov [edx],eax
mov al,1 ; Set return value to true
jmp Bye ; Jump to exit
SetF:
xor al,al ; Set return value to false
Bye:
}
}
/*******************************************************************************
Used to avoid errors from precision limitations when converting a double to
a boolean. Converts the argument to a positive then compares the
result to a small double that is the maximum allowed value that is considered
false. This technique is about 8 times faster than using the C runtime
fabs() function because like _ftol fabs() changes FPU control flags.
Returns true if 'v' is within or at +- 1.0e-10 of zero
Returns false if 'v' is outside +- 1.0e-10 of zero
There is no need to range check 'v' before calling IsIndex
*******************************************************************************/
const double g_MaxBool = 1.0e-10;
__forceinline bool ToBool( double v )
{
__asm
{
fld v ; Push 'v' into st(0) of FPU stack
fabs ; Drop the sign of st(0)
fcomp g_MaxBool; Compare st(0) to g_MaxBool
fnstsw ax ; Moves FPU code flags to AX (AH) register
test ah,41h ; Test if AH is 40h or 1h (meaning st(0) <= g_MaxBool)
je SetF ; Jump to SetF if test was false
xor al,al ; Set return value to false (since 'v' is 0 or very small)
jmp Bye ; Jump to exit
SetF:
mov al,1 ; Set return value to true
Bye:
}
}
#pragma warning( default: 4035 )
#2
No matter how good (or bad) your compiler is, the Intel FPU's default rounding mode does not match ANSI C. This is the root performance issue, not the compiler's optimizing abilities. As I mentioned in the comments, the C standard requires truncating conversions from integer to float. And since the Intel FPU normally operates in rounding mode the FPU control flags must be changed during a conversion.
The code I wrote allows a developer to say: "Forget the ANSI C standard, I can live with rounding conversions because they are much faster on Intel."
It it true that there could be a C library (or compiler specific) function for rounding conversions, but I am not aware of any such functions in either the VC6 or ANSI C libraries. If you know something otherwise please share it. I would gladly nuke the ASM code in my game because it is harder to understand and maintain than C code.
Also, while eliminating such conversions is a good strategy it is not always possible. For example, try implementing a typeless scripting language without any such conversions (and without a custom math package).
-Brad
07/07/2001 (2:48 pm)
Perhaps you could share your "common knowledge" and describe how to take advantage of these optimizations in pure C/C++ code?No matter how good (or bad) your compiler is, the Intel FPU's default rounding mode does not match ANSI C. This is the root performance issue, not the compiler's optimizing abilities. As I mentioned in the comments, the C standard requires truncating conversions from integer to float. And since the Intel FPU normally operates in rounding mode the FPU control flags must be changed during a conversion.
The code I wrote allows a developer to say: "Forget the ANSI C standard, I can live with rounding conversions because they are much faster on Intel."
It it true that there could be a C library (or compiler specific) function for rounding conversions, but I am not aware of any such functions in either the VC6 or ANSI C libraries. If you know something otherwise please share it. I would gladly nuke the ASM code in my game because it is harder to understand and maintain than C code.
Also, while eliminating such conversions is a good strategy it is not always possible. For example, try implementing a typeless scripting language without any such conversions (and without a custom math package).
-Brad

Thygrrr
Otherwise, neat functions...