bounty: convert TGE's processTriFan routine from stock x86 -> SSE
by Orion Elenzil · in Jobs · 01/20/2009 (4:35 pm) · 0 replies
i'm offering a $200 bounty for converting the TGE routine "processTriFan" from stock x86 assembly to SSE. additional $100 if the resulting routine is significantly faster. (this routine currently takes about 15% of our steady-state CPU time when rendering lots of environment. "significant" would be shaving that down to 10%.)
i think there's room for some advantage from SSE here.
as a side-note,
i've already converted one of the four very similar portions of it,
but while it functions correctly, it isn't noticably faster,
and when i extend the conversion to the second of the four portions,
it no longer functions correctly. so clearly my ASM / SSE foo is poor !
the core lines of assembly to be converted look like this:
i can provided a commented version of that, and of course my SSE version.
if interested, post here or email me at atmuuxs02@sneakemail.com.
i think there's room for some advantage from SSE here.
as a side-note,
i've already converted one of the four very similar portions of it,
but while it functions correctly, it isn't noticably faster,
and when i extend the conversion to the second of the four portions,
it no longer functions correctly. so clearly my ASM / SSE foo is poor !
the core lines of assembly to be converted look like this:
fld dword [_texGen0 + 16] ; tg0.t.x
fmul dword [esi + 0]
fld dword [_texGen0 + 20] ; tg0.t.y
fmul dword [esi + 4]
fld dword [_texGen0 + 24] ; tg0.t.z
fmul dword [esi + 8]
fld dword [_texGen0 + 28] ; tg0.t.w
faddp st3, st0
faddp st1, st0
faddp st1, st0
fstp dword [edi + 20] ; tc0.ti can provided a commented version of that, and of course my SSE version.
if interested, post here or email me at atmuuxs02@sneakemail.com.
About the author