Game Development Community

bounty: convert TGE's processTriFan routine from stock x86 -> SSE

by Orion Elenzil · in Jobs · 01/20/2009 (4:35 pm) · 0 replies

i'm offering a $200 bounty for converting the TGE routine "processTriFan" from stock x86 assembly to SSE. additional $100 if the resulting routine is significantly faster. (this routine currently takes about 15% of our steady-state CPU time when rendering lots of environment. "significant" would be shaving that down to 10%.)

i think there's room for some advantage from SSE here.

as a side-note,
i've already converted one of the four very similar portions of it,
but while it functions correctly, it isn't noticably faster,
and when i extend the conversion to the second of the four portions,
it no longer functions correctly. so clearly my ASM / SSE foo is poor !

the core lines of assembly to be converted look like this:
fld     dword [_texGen0 + 16]   ; tg0.t.x   
    fmul    dword [esi + 0]
    fld     dword [_texGen0 + 20]   ; tg0.t.y
    fmul    dword [esi + 4]
    fld     dword [_texGen0 + 24]   ; tg0.t.z
    fmul    dword [esi + 8]
    fld     dword [_texGen0 + 28]  ; tg0.t.w
    faddp   st3, st0
    faddp   st1, st0
    faddp   st1, st0
    fstp    dword [edi + 20]    ; tc0.t

i can provided a commented version of that, and of course my SSE version.

if interested, post here or email me at atmuuxs02@sneakemail.com.