Crash course on assembly ? - processTriFan
by Orion Elenzil · in Torque Game Engine · 01/13/2009 (10:21 am) · 5 replies
Howdy all -
processTriFan is showing up as a hotspot on our windows build,
and i notice in the assembly code for it the comment "This could be faster".
i'd love to go and make it faster,
but my assembly is a bit rusty.
i coded my share of assembly on the 68000 (amiga),
but have never looked at modern x86 code.
is there a recommended cheat-sheet somewhere ?
eg, my guess is that the following code is copying stuff from one memory location to another through some registers. what's the significance of putting something in brackets ?
has anyone looked at caching the results of processTriFan ?
it looks like it gets called every frame.
is it responsible for fogging on Interiors ?
many tia,
ooo
processTriFan is showing up as a hotspot on our windows build,
and i notice in the assembly code for it the comment "This could be faster".
i'd love to go and make it faster,
but my assembly is a bit rusty.
i coded my share of assembly on the 68000 (amiga),
but have never looked at modern x86 code.
is there a recommended cheat-sheet somewhere ?
eg, my guess is that the following code is copying stuff from one memory location to another through some registers. what's the significance of putting something in brackets ?
mov eax, [esi + 0] ; x
mov ebx, [esi + 4] ; y
mov ecx, [esi + 8] ; z
mov edx, [esi + 12] ; f
mov [edi + 0], eax ; <- x
mov [edi + 4], ebx ; <- y
mov [edi + 8], ecx ; <- z
mov [edi + 12], edx ; <- fhas anyone looked at caching the results of processTriFan ?
it looks like it gets called every frame.
is it responsible for fogging on Interiors ?
many tia,
ooo
About the author
#2
the following compiles and has identical output:
w00t!
i'll be going through the rest of that routine and seeing what i can do.
01/13/2009 (11:13 am)
Many thanks, Jaimi!the following compiles and has identical output:
movups xmm0 , [esi ]
movups [edi ] , xmm0w00t!
i'll be going through the rest of that routine and seeing what i can do.
#4
but if anyone's interested, here are a couple small optimizations to processTriFan:
part 1,
let's get rid of the "*4" in "lea esi, [esi + ebp*4]":
just after the lines:
change this line:
at the bottom of the routine, change this:
.. i'm not sure how much of an optimization that really is, but who knows.
part2,
convert the copying of memory to SSE:
change these lines:
part 3,
convert the math to SSE.
.. this is where it's going to take me a long time to learn, so i'm moving on to other stuff.
but in case anyone is reading this and feels like taking it on,
here's the C:
01/13/2009 (12:51 pm)
I'm halting this for the moment,but if anyone's interested, here are a couple small optimizations to processTriFan:
part 1,
let's get rid of the "*4" in "lea esi, [esi + ebp*4]":
just after the lines:
mov [srcIndices], eax
mov eax, in_numpointsadd:shl eax, 2 ; numPoints *= 4
change this line:
lea esi, [esi + ebp*4]to this:
lea esi, [esi + ebp]
at the bottom of the routine, change this:
inc ebpto this:
add ebp, 4
.. i'm not sure how much of an optimization that really is, but who knows.
part2,
convert the copying of memory to SSE:
change these lines:
mov eax, [esi + 0] ; x
mov ebx, [esi + 4] ; y
mov ecx, [esi + 8] ; z
mov edx, [esi + 12] ; f
mov [edi + 0], eax ; <- x
mov [edi + 4], ebx ; <- y
mov [edi + 8], ecx ; <- z
mov [edi + 12], edx ; <- fto these lines:movups xmm0 , [esi ] ; copy xyzf
movups [edi ] , xmm0part 3,
convert the math to SSE.
.. this is where it's going to take me a long time to learn, so i'm moving on to other stuff.
but in case anyone is reading this and feels like taking it on,
here's the C:
dst->texCoord.x = (texGen0[0]*x) + (texGen0[1]*y) + (texGen0[2]*z) + (texGen0[3] );and here's the regular x86 assembly which might benefit from SSE-ification:
; tc0.s
fld dword [_texGen0 + 0] ; tg0.s.x
fmul dword [esi + 0]
fld dword [_texGen0 + 4] ; tg0.s.y
fmul dword [esi + 4]
fld dword [_texGen0 + 8] ; tg0.s.z
fmul dword [esi + 8]
fld dword [_texGen0 + 12] ; tg0.s.w
faddp st3, st0
faddp st1, st0
faddp st1, st0
fstp dword [edi + 16] ; tc0.s
#5
01/14/2009 (1:11 pm)
test comment, pls ignore.
Associate Jaimi McEntire
King of Flapjacks
edit:
That looks like it could be replaced by a rep movsd, but I'm not sure that would give you much benefit.
If it gets copied a lot, perhaps replacing it with SSE (or even MMX) would be a good idea.
My mmx is rusty, but that would be something like this:
movdqa xmm0, [esi]
movdqa [edi], xmm0