Game Development Community

dev|Pro Game Development Curriculum

Streaming data using SSE instructions

by Bj · 02/19/2001 (8:37 pm) · 0 comments

This code snippet is from a very fast fractal generator written by using Streaming SIMD Extensions (SSE) instructions. With a throughput of 1.1Gigaflop/Sec on a 600Mhz P3, the high framerate meant that too much time were lost while updating the display.

By using the movntq instruction and some MMX code, an increase of framrate by 30% was achieved. The most notable feature of the movntq instruction is that it streams data around the cache without evicting your valuable data.


The speedup was contributed by:
1. Writing 64bit of data at a time.
2. Not polluting the data cache with data that is never used again.
3. Using MMX to work on two 32bit pixel values at a time.

//  imageAdr - The address of the 32bit RGB output image
//  lya128   - SSE register containing 4 scaled floatingpoint values

__asm {
	mov		edx,DWORD PTR[imageAdr]
	movaps		xmm0,lya128
	cvtps2pi		mm0,lya128
	movq		mm1,mm0
	pslld		mm0,8		;Red
	pslld		mm1,16		;Green
	por		mm0,mm1		;=Yellow
	movntq		MM2WORD PTR [edx],mm0
	
	movhlps		xmm0,xmm0
	cvtps2pi		mm0,xmm0
	movq		mm1,mm0
	pslld		mm0,8
	pslld		mm1,16
	por		mm0,mm1
	movntq		MM2WORD PTR [edx+8],mm0
	emms
}

About the author

Recent Blogs