Game Development Community

Double vs. float constants

by asmaloney (Andy) · in Torque Game Engine · 01/19/2007 (4:42 pm) · 6 replies

As I've been working my way around the TGE codebase, I've noticed many, many places where double constants are used instead of floats in float calculations - e.g.:

pmid->x = (p1->x + p2->x) * [b]0.5[/b];

for which gcc generated the following PPC asm:
...
[b]lfd      f31,-160(r2)	[/b]		
lfs      f13,0(r25)
lfs      f12,0(r24)
lfs      f11,4(r24)		
lfs      f0,4(r25)		
fadds    f13,f13,f12
fadds    f0,f0,f11	
[b]fmul     f13,f13,f31	[/b]	
fmul     f0,f0,f31
[b]frsp     f11,f13	[/b]		
frsp     f12,f0		
stfs     f11,52(r23)				
stfs     f12,4(r31)				
...

If we fix this:

pmid->x = (p1->x + p2->x) * [b]0.5f[/b];

We get much nicer asm [if it can ever be called nice]:
...
[b]lfs      f31,-3192(r2)	[/b]
lfs      f13,0(r25)		
lfs      f12,0(r24)	
lfs      f11,4(r24)	
lfs      f0,4(r25)
fadds    f13,f13,f12
fadds    f0,f0,f11	
[b]fmuls    f11,f13,f31		[/b]
fmuls    f12,f0,f31		
[b]stfs     f11,52(r23)		[/b]	
stfs     f12,4(r31)
...

I don't know what other compilers do with this - I'd be curious if someone with VC++ could see what it does.

As I mentioned, this is throughout the codebase and could improve things if it were fixed systematically...

[Edit: Sorry - should have pointed out that the code is doing more than just that one expression - it's actually interleaved the next expression - pmid->y = ...]

#1
01/19/2007 (5:28 pm)
Got a simpler example? ;)
#2
01/19/2007 (6:20 pm)
Sure.

float	foo( float input ) { return( input * [b]0.5[/b] ); }

becomes:
lis r2,ha16(LC0)
	lfd f0,lo16(LC0)(r2)
	fmul f1,f1,f0
	frsp f1,f1
	blr

Whereas:
float	foo( float input ) { return( input * [b]0.5f[/b] ); }

becomes:
lis r2,ha16(LC0)
	lfs f0,lo16(LC0)(r2)
	fmuls f1,f1,f0
	blr
#3
01/19/2007 (8:53 pm)
I'm not sure if this will help you or not but the MSVC6 compiler generated this from pmid->x = (p1->x + p2->x) * 0.5;:

0082F1FA   mov         edx,dword ptr [ebp-14h]
0082F1FD   mov         eax,dword ptr [ebp-18h]
0082F200   fld         dword ptr [edx]
0082F202   fadd        dword ptr [eax]
0082F204   fmul        qword ptr [__real@8@3ffe8000000000000000 (00c88010)]
0082F20A   mov         ecx,dword ptr [ebp-10h]
0082F20D   fstp        dword ptr [ecx]

and it generated this using a float (0.5f rather than 0.5):

0082C35A   mov         edx,dword ptr [ebp-14h]
0082C35D   mov         eax,dword ptr [ebp-18h]
0082C360   fld         dword ptr [edx]
0082C362   fadd        dword ptr [eax]
0082C364   fmul        dword ptr [__real@4@3ffe8000000000000000 (00c789c4)]
0082C36A   mov         ecx,dword ptr [ebp-10h]
0082C36D   fstp        dword ptr [ecx]
#4
01/19/2007 (9:19 pm)
Thanks Chris - that's interesting. So the VC++ compiler is giving the same number of instructions for both, only the fmul happens with a different argument. So this change will have no affect on Windows [unless an fmul with a qword is slower than an fmul with a dword - no idea there].

Now I'm curious what gcc gives when it generates x86 asm for this...
#5
01/19/2007 (11:14 pm)
I have next to no assembly experience but the second version should result in moving 4 bytes rather than 8 (__real@4 rather than __real@8, I think). I don't know what kind of performance gain that actually nets but it certainly can't hurt.
#6
01/20/2007 (5:42 am)
Yes, you are correct -that's what I was trying to get at with my 'unless' comment.

Out of curiosity, any x86 experts know if fmul qword is slower than fmul dword or do they take the same number of cylces?

In the end, it wouldn't have the potential impact that it does with gcc for PPC. Most of those frsp [Floating Round to Single-Precision] instructions stall the processor. Also, because it's actually removing these instructions, it would give the optimizer a chance to schedule things differently.