Streaming data using SSE instructions
by Bj · 02/19/2001 (8:37 pm) · 0 comments
This code snippet is from a very fast fractal generator written by using Streaming SIMD Extensions (SSE) instructions. With a throughput of 1.1Gigaflop/Sec on a 600Mhz P3, the high framerate meant that too much time were lost while updating the display.
By using the movntq instruction and some MMX code, an increase of framrate by 30% was achieved. The most notable feature of the movntq instruction is that it streams data around the cache without evicting your valuable data.
The speedup was contributed by:
1. Writing 64bit of data at a time.
2. Not polluting the data cache with data that is never used again.
3. Using MMX to work on two 32bit pixel values at a time.
By using the movntq instruction and some MMX code, an increase of framrate by 30% was achieved. The most notable feature of the movntq instruction is that it streams data around the cache without evicting your valuable data.
The speedup was contributed by:
1. Writing 64bit of data at a time.
2. Not polluting the data cache with data that is never used again.
3. Using MMX to work on two 32bit pixel values at a time.
// imageAdr - The address of the 32bit RGB output image
// lya128 - SSE register containing 4 scaled floatingpoint values
__asm {
mov edx,DWORD PTR[imageAdr]
movaps xmm0,lya128
cvtps2pi mm0,lya128
movq mm1,mm0
pslld mm0,8 ;Red
pslld mm1,16 ;Green
por mm0,mm1 ;=Yellow
movntq MM2WORD PTR [edx],mm0
movhlps xmm0,xmm0
cvtps2pi mm0,xmm0
movq mm1,mm0
pslld mm0,8
pslld mm1,16
por mm0,mm1
movntq MM2WORD PTR [edx+8],mm0
emms
}