G-Buffer Normals and Trig Lookup Textures
by Pat Wilson · 08/28/2008 (11:25 am) · 11 comments
(Cross posted from personal blog. Hopefully someone has a rockin' idea about this.) I've switched to using spherical co-ordinates to encode world-space G-buffer normals. This has a lot of advantages to it, especially for 8:8:8:8 G-buffers. You can now store [Theta, Phi, DepthHi, DepthLo] in the 8:8:8:8 target. Bumping the format to 16:16:16:16 not only gives greater precision, but (depending on the depth resolution you need) you can get an extra channel for information storage. (Mmmm virtual texture bitfield?)
There are a few problems with this, though. The first is the atan2 function. On good hardware, this will not be an issue. My GeForce 8800 chews through any shader I throw at it. My Radeon x1300...not so much. Of course it's easy to make things run fast on good hardware. The challenge is making it run decently on lower end hardware. This is how I am encoding/decoding G-buffer normals:
inline float2 cartesianToSpGPU( in float3 normalizedVec )
{
float atanYX = atan2( normalizedVec.y, normalizedVec.x );
float2 ret = float2( atanYX / PI, normalizedVec.z );
return POS_NEG_ENCODE( ret );
}
inline float3 spGPUToCartesian( in float2 spGPUAngles )
{
float2 expSpGPUAngles = POS_NEG_DECODE( spGPUAngles );
float2 scTheta;
sincos( expSpGPUAngles.x * PI, scTheta.x, scTheta.y );
float2 scPhi = float2( sqrt( 1.0 - expSpGPUAngles.y * expSpGPUAngles.y ), expSpGPUAngles.y );
// Renormalization not needed
return float3( scTheta.y * scPhi.x, scTheta.x * scPhi.x, scPhi.y );
}Storing normal.z instead of acos( normal.z ) saves a decent chunk of encode/decode.So I decided to try to use a lookup texture instead of calling atan2 to encode the normals. I made a 256x256 A8 texture and filled it with atan2 values. The texture can be seen, to the right. This is the code for generating the texture:
GFXTexHandle *RenderPrePassMgr::getAtan2Texture()
{
if( mAtan2Handle.isNull() )
{
// Create a lookup texture to output a normalized atan2 result
const U32 cLookupTexSz = 256;
mAtan2Handle.set( cLookupTexSz, cLookupTexSz, GFXFormatA8, &GFXLookupTextureProfile, 1 );
GFXLockedRect *atan2Mem = mAtan2Handle.lock();
for( int y = 0; y < cLookupTexSz; y++ )
{
for( int x = 0; x < cLookupTexSz; x++ )
{
F32 xval = ( ( x / F32(cLookupTexSz) ) * 2.0f - 1.0f );
F32 yval = ( ( y / F32(cLookupTexSz) ) * 2.0f - 1.0f );
U8 &outU8 = atan2Mem->bits[y * cLookupTexSz + x];
F32 atanRes = ( atan2( yval, xval ) + M_PI_F ) / M_2PI_F;
U8 u8Res = mFloor( atanRes * 255.0f );
outU8 = u8Res;
}
}
mAtan2Handle.unlock();
}
return &mAtan2Handle;
}There is a possible discontinuity when V is near 0.5 and U < 0.5. So if normal.y is near 0.0, and normal.x < 0.0 than you get some artifacts that won't occur if you actually call atan2. The A8 format isn't the issue (I don't think) because even if you use the actual function, you are still encoding the result to an 8-bit value. The resolution of the texture could be an issue, but doubling the resolution did not effect the error rate, in my tests.I haven't quite figured out what to do with this yet. A G-buffer shader is going to be heavy on math, light on texture operations (Well this depends on how you are doing deferred shading. See upcoming ShaderX7!) and doing the atan2 as a texture sample can save a lot of instructions and cycles, depending on the hardware.
About the author
#2
08/28/2008 (2:02 pm)
@Pat, really interesting idea there. No idea why you're getting those anomalies though.
#3
08/28/2008 (2:42 pm)
I found the answer to this after a bunch of mucking around.inline float2 cartesianToSpGPU( in float3 normalizedVec, in sampler2D atan2Sampler )
{
float atanYXOut = tex2D( atan2Sampler, floor( POS_NEG_ENCODE(normalizedVec.xy ) * 255.0 ) / 255.0 ).a;
float2 ret = float2( atanYXOut, POS_NEG_ENCODE( normalizedVec.z ) );
return ret;
}The critical bit is:floor( value * 255.0 ) / 255.0
#4
I'll take a proper GPU any day of the week over a lookup, but this is a pretty slick hack. This also lets you fold range-reduction logic into the results. I don't think I can do any kind of great range reduction on the result of atan2() because I am taking both the sine and cosine of the result.
08/28/2008 (2:56 pm)
Also, I am using a sincos() lookup texture to read-back the normal values. This doesn't suffer from the same issues, because the input value is [0, 255] always (8:8:8:8 G-buffer). The sincos HLSL function is much cheaper than atan2, but this is (again) one of those good GPU vs poor GPU issues. Mostly, though, it's for the damn spot light shader. I had to use a sampler to get it down from 66 instructions to 63 instructions, targeting PS2.0. I'll take a proper GPU any day of the week over a lookup, but this is a pretty slick hack. This also lets you fold range-reduction logic into the results. I don't think I can do any kind of great range reduction on the result of atan2() because I am taking both the sine and cosine of the result.
#5
08/28/2008 (6:24 pm)
Quote:awesome.
I made a 256x256 A8 texture and filled it with atan2 values.
#6
It's not updated regularly, and it doesn't have any kind of consistent content.
08/29/2008 (1:45 pm)
My personal blog is: http://angrydev.comIt's not updated regularly, and it doesn't have any kind of consistent content.
#7
I tought you were angry with me, but was sure you just needed some time to rethink things.
Nice name election, btw.
08/29/2008 (9:49 pm)
Thanks.I tought you were angry with me, but was sure you just needed some time to rethink things.
Nice name election, btw.
#8
I'm interested in knowing more about this technique you're using, in particular what's the advantage of using spherical coords?
I mean, if you stored screen-space normal and did screen-space lighting instead of world-space, two cartesian coordinates would be enough to store a normal (assuming you're culling back-facing triangles), and that would save you all the encoding/decoding.
Apparently these days world-space deferred lighting is somewhat popular though, and I'd like to know why :)
09/03/2008 (6:29 am)
Pat, if your atan2Sampler has got POINT filtering, you shouldn't need to do the floor thing, the hardware should do it for you automatically, for free.I'm interested in knowing more about this technique you're using, in particular what's the advantage of using spherical coords?
I mean, if you stored screen-space normal and did screen-space lighting instead of world-space, two cartesian coordinates would be enough to store a normal (assuming you're culling back-facing triangles), and that would save you all the encoding/decoding.
Apparently these days world-space deferred lighting is somewhat popular though, and I'd like to know why :)
#9
I like world-space deferred lighting because, in general, it does not matter what space the normals are stored in, a negative value is always possible (People who do say otherwise and claim that storing normals in view-space will allow you to optimize out PI-worth of the possible angles are making a false assumption). So it really just comes down to what space you want the data in. I like world space. You can just set up your light shader constants in world space, and not take up valuable vertex->pixel data transfer registers.
06/26/2009 (11:02 am)
Hadoken, that is actually not true. Even with point sampling, it is up to the card to decide, based on my texture coord-input, which texel it wants to sample. Performing the floor in the pixel shader ensures it will happen properly (try it without the clamp hehe).I like world-space deferred lighting because, in general, it does not matter what space the normals are stored in, a negative value is always possible (People who do say otherwise and claim that storing normals in view-space will allow you to optimize out PI-worth of the possible angles are making a false assumption). So it really just comes down to what space you want the data in. I like world space. You can just set up your light shader constants in world space, and not take up valuable vertex->pixel data transfer registers.
#10
06/22/2010 (10:24 am)
Looking back on this, I'd like to revise my stand: I now prefer view-space normals.
#11
I'm very new to deferred rendering / shading techniques and still working on it.
From the beginning decided to go with VS normals (uneducated blind decision) and have heard mixed opinions whether VS or WS normals are the best way to go.
Can you share why you used to think that World Space Normals were better and what made you change your mind?
Also, can you explain how do you pack the depth values into the two HI and LO channels?
07/09/2010 (4:55 pm)
Hi there,I'm very new to deferred rendering / shading techniques and still working on it.
From the beginning decided to go with VS normals (uneducated blind decision) and have heard mixed opinions whether VS or WS normals are the best way to go.
Can you share why you used to think that World Space Normals were better and what made you change your mind?
Also, can you explain how do you pack the depth values into the two HI and LO channels?

Torque 3D Owner Novack
CyberianSoftware