Deferred Shading
by Andrew Mac · in Torque 3D Professional · 06/22/2014 (9:38 am) · 116 replies
About 2 weeks ago Azaezel, Timmy and myself set out to bring Physical Based Shading to Torque. One of the biggest hurdles we ran into is the lighting information we need is not available at the stages we need it. This is due to Torque's prepass lighting (aka deferred lighting).
How does deferred lighting work?
Everything is rendered twice. Once to calculate lighting, and then everything is rendered again + the previously obtained lighting information for your final lit scene. This is why a 30k poly model loaded into Torque will show 60k.
This may sound very expensive but it's made in a way that the light prepass is as cheap as possible. It's by no means a bad method of doing things, nor was it a poor choice at the time. In fact, most engines around that time frame used deferred lighting. It also has the advantage of running better on older hardware.
How does deferred shading work?
Everything is rendered once, but it's done in a rather clever way. When rendering, a pixel shader is used to write all the data needed to different buffers. One for depth/normals, one for lighting (specular info, etc), and one for color. Then at the end a shader is used to combine all the information from the different buffers into a final picture.
This reduces rendering to once, but puts more stress on the graphics card and uses more video memory. This was not necessarily considered the better choice until recent years when graphics hardware has gotten better.
Why make the switch?
I hate to say "because the other guys are doing it" but if you look into it the big players that were using deferred lighting have made the change to deferred shading ( see: UE4 ). This will allow us to easily implement different lighting models like cook-torrance.
It also has the advantage of allowing some additional shaders and postfx that we couldn't really do before. Take for instance Lukas' SSGI shader. One big issue he ran into is the lack of an available albedo (or color) buffer to read just color data from. This is available with deferred shading.
What's the status?
We've been working on the conversion for about a week now and everything seems to be working quite well. There's a few missing features, and a handful of broken things but it's functional and working. Performance is about the same as before, which we consider to be a good thing since we haven't attempted any kind of optimization yet.
Conclusion
It's a pretty big change and will have a huge impact on materials, material shaders, and lighting. While we plan to continue working on this regardless for personal use, we would like to know if this is something the community is interested seeing make it's way into the main repo. This wouldn't be a 3.6 or 3.7 change, but further down the line (perhaps 4.0?).
Links
Link to the development repo:
github.com/andr3wmac/Torque3D/tree/deferred_shading
A guy's benchmarking of deferred lighting vs shading:
frictionalgames.blogspot.ca/2010/10/pre-pass-lighting-redux.html
An article that evaluates Deferred Lighting VS Deferred Shading:
gameangst.com/?p=141
How does deferred lighting work?
Everything is rendered twice. Once to calculate lighting, and then everything is rendered again + the previously obtained lighting information for your final lit scene. This is why a 30k poly model loaded into Torque will show 60k.
This may sound very expensive but it's made in a way that the light prepass is as cheap as possible. It's by no means a bad method of doing things, nor was it a poor choice at the time. In fact, most engines around that time frame used deferred lighting. It also has the advantage of running better on older hardware.
How does deferred shading work?
Everything is rendered once, but it's done in a rather clever way. When rendering, a pixel shader is used to write all the data needed to different buffers. One for depth/normals, one for lighting (specular info, etc), and one for color. Then at the end a shader is used to combine all the information from the different buffers into a final picture.
This reduces rendering to once, but puts more stress on the graphics card and uses more video memory. This was not necessarily considered the better choice until recent years when graphics hardware has gotten better.
Why make the switch?
I hate to say "because the other guys are doing it" but if you look into it the big players that were using deferred lighting have made the change to deferred shading ( see: UE4 ). This will allow us to easily implement different lighting models like cook-torrance.
It also has the advantage of allowing some additional shaders and postfx that we couldn't really do before. Take for instance Lukas' SSGI shader. One big issue he ran into is the lack of an available albedo (or color) buffer to read just color data from. This is available with deferred shading.
What's the status?
We've been working on the conversion for about a week now and everything seems to be working quite well. There's a few missing features, and a handful of broken things but it's functional and working. Performance is about the same as before, which we consider to be a good thing since we haven't attempted any kind of optimization yet.
Conclusion
It's a pretty big change and will have a huge impact on materials, material shaders, and lighting. While we plan to continue working on this regardless for personal use, we would like to know if this is something the community is interested seeing make it's way into the main repo. This wouldn't be a 3.6 or 3.7 change, but further down the line (perhaps 4.0?).
Links
Link to the development repo:
github.com/andr3wmac/Torque3D/tree/deferred_shading
A guy's benchmarking of deferred lighting vs shading:
frictionalgames.blogspot.ca/2010/10/pre-pass-lighting-redux.html
An article that evaluates Deferred Lighting VS Deferred Shading:
gameangst.com/?p=141
About the author
#62
As a generality though, 3dc still gives better results than dxt5nm.
Edit: Lots of technical jargon you would better understand, but I think this is what I was referencing with the extra channels containing data.
www.nvidia.com/object/real-time-normal-map-dxt-compression.html
07/03/2014 (2:40 pm)
Timmy, I thought the swizzle trick also combined those extra channels to give a less compressed look to the final? Can't research it right at the moment. Think it was an nvidia paper. From an artist standpoint, to save in the dxt5nm, the plugins we use expect a normal normal map. It would ignore a parallax channel in blue or alpha. It thus would require manually changing the channels around and then saving in regular dxt5.As a generality though, 3dc still gives better results than dxt5nm.
Edit: Lots of technical jargon you would better understand, but I think this is what I was referencing with the extra channels containing data.
Quote: se of the _Y_X DXT5 compression of tangent-space normal maps there are two unused channels, and one of these channels can be used to also store a bias to center the dynamic range. This significantly increases the number of 4x4 blocks for which the values can be scaled up (such that typically more than 75% of all 4x4 blocks use a scale factor of at least 2). However, even using a bias to increase the number of scaled 4x4 blocks does not help much to improve the quality. The real problem is that the four sample points of the DXT1 block are simply not enough to accurately represent all the Y components of the normals in a 4x4 block. Introducing more sample points would significantly improve the quality but this is obviously not possible within the DXT5 format.
Instead of storing a bias and scale, one of the spare channels can also be used to store a rotation of the normal vectors in a 4x4 block about the Z-axis, as
www.nvidia.com/object/real-time-normal-map-dxt-compression.html
#63
One game in particular that does this, i tried before to save just as a regular dxt5nm without the data in the red channel(actually leaving the red and blue blanked out as suggested by most) and honestly i couldn't pick the difference. No doubt there would be a difference but i couldn't notice it in real time playing the game, i'm sure if you studied the difference in a world editor or something you may notice it.
Unfortunately there are always pros and cons to this sort of thing, i'll leave it up to everyone to test and decide. The support is there if people want it, if you don't use it than everything will continue as before as the changes don't break anything.
Oops sorry Andrew mac hijacking the thread here lol
07/03/2014 (3:56 pm)
Yes that is all 100% correct however i have seen games ship using data in the red channel before using the dxt5 swizzle trick. The results will not be as good doing this but as you know we can't always have our cake and eat it too ;-). I am no photoshop expert but surely there is a way to automate swapping the red and alpha channels? than just save as a regular dxt5 texture. One game in particular that does this, i tried before to save just as a regular dxt5nm without the data in the red channel(actually leaving the red and blue blanked out as suggested by most) and honestly i couldn't pick the difference. No doubt there would be a difference but i couldn't notice it in real time playing the game, i'm sure if you studied the difference in a world editor or something you may notice it.
Unfortunately there are always pros and cons to this sort of thing, i'll leave it up to everyone to test and decide. The support is there if people want it, if you don't use it than everything will continue as before as the changes don't break anything.
Oops sorry Andrew mac hijacking the thread here lol
#64
Spec enabled: i.imgur.com/a87fnk1.png
Spec disabled: i.imgur.com/cQ8l0yO.png
07/03/2014 (4:33 pm)
Anyway back on topic, i got the terrain spec working with the new material buffers that andrew implemented the other day. I have sorted out a few of the blending issues when using multiple layers but still a few remain. I got the spec cranked up a little bit high in those screenshots but anyway.Spec enabled: i.imgur.com/a87fnk1.png
Spec disabled: i.imgur.com/cQ8l0yO.png
#65
Timmy, photoshop can already save in dxt-nm with the Nvidia plugin. Yes, a action could possibly be created to switch channels around. But by your own admission, your getting less data then from those extra channels. If your playing a high definition non action FP game looking at objects closely in the world, it really isn't fair to the artist to basically say "live with it" when other engines show better results.
I guess if its only when parallax is needed, and otherwise the engine will still use the dxt-nm to its full capability (i.e. looking at the additional channels to fill in the blanks). It could pass.
There is also of course thinking ahead having to add a spot for tessellation maps.
TBH, I wish someone would take on adding 3dc/BC5 support?
07/03/2014 (6:06 pm)
Andrew: 207.246.157.76/ transtest.7z Per wanting a transmission test.Timmy, photoshop can already save in dxt-nm with the Nvidia plugin. Yes, a action could possibly be created to switch channels around. But by your own admission, your getting less data then from those extra channels. If your playing a high definition non action FP game looking at objects closely in the world, it really isn't fair to the artist to basically say "live with it" when other engines show better results.
I guess if its only when parallax is needed, and otherwise the engine will still use the dxt-nm to its full capability (i.e. looking at the additional channels to fill in the blanks). It could pass.
There is also of course thinking ahead having to add a spot for tessellation maps.
TBH, I wish someone would take on adding 3dc/BC5 support?
#66
Honestly test for yourself.
Not the greatest test ever but anyway:
No normal map: i.imgur.com/9OC7ZQy.png
Png normal map: i.imgur.com/tgvag7k.png
Dxt3 normal map: i.imgur.com/WYja3Gd.png
Dxt5nm normal map: i.imgur.com/EY8utlJ.png
Dxt5nm* normal map: i.imgur.com/LHY9cBC.png
Note the Dxt5nm* normal i stuffed height into the red channel just for testing, no parallax was enabled for any screenshot. I'll try and do something a little more conclusive over the weekend, a black floor like Duion used will most likely show the results better.
07/03/2014 (7:23 pm)
Yeah i know it can, what i'm getting at is you can manually do it and get the exact same results, dxt5nm is just simply dxt5. This is totally unnecessary unless you want parallax support of course. The engine does not care if there is data or not in the red channel of the dxt5nm, all the magic happens offline when you export the texture.Honestly test for yourself.
Not the greatest test ever but anyway:
No normal map: i.imgur.com/9OC7ZQy.png
Png normal map: i.imgur.com/tgvag7k.png
Dxt3 normal map: i.imgur.com/WYja3Gd.png
Dxt5nm normal map: i.imgur.com/EY8utlJ.png
Dxt5nm* normal map: i.imgur.com/LHY9cBC.png
Note the Dxt5nm* normal i stuffed height into the red channel just for testing, no parallax was enabled for any screenshot. I'll try and do something a little more conclusive over the weekend, a black floor like Duion used will most likely show the results better.
#67
github.com/GarageGames/Torque3D/blob/8142a3e9645e881cd8e21434809289855e57a0b4/En... would seem to indicate that that'd mangle things were it used. And clearly its not, judging by those pics...
07/04/2014 (5:39 pm)
@Timmy: mind elaborating on how that red-channel bit works? reading overgithub.com/GarageGames/Torque3D/blob/8142a3e9645e881cd8e21434809289855e57a0b4/En... would seem to indicate that that'd mangle things were it used. And clearly its not, judging by those pics...
#68
07/04/2014 (6:10 pm)
I gotta admit i never noticed that DXT5nmSwizzle class ;-). What i presume that is doing is actually creating a dxt5nm texture manually from the supplied pixel data. This step is obviously not required when you supply a texture that has already had the swizzle trick applied. I am not home at the moment, so i can't search the source code to answer 100% but i'm guessing there is an option somewhere that enables T3D to create a dxt5nm texture for you.
#69
07/04/2014 (6:34 pm)
The research assistance would be appreciated, since we're also trying to track down some nasty artifacting along the way. (Might as well look her on over while we're digging in the area, after all.)
#70
07/04/2014 (6:58 pm)
If i had to take another guess i would say this a part of a preparation step on the pixel data before sending the data (to squish for example) for compression. I'll take look when i get home anyway.
#71
*Edit:
For better quality the DXT5nmSwizzle class should be writing the same data in the two unused channels (i.e 0xFF would suffice for both).
07/04/2014 (8:30 pm)
Aah ok, what is happening here is the GFXTextureManager is using the DXT5nmSwizzle class as part of the conversion of a GBitmap (marked as a normal map and dxt5 format) to a compressed texture via squish. If you supply a dds in the first place this obviously doesn't happen.*Edit:
For better quality the DXT5nmSwizzle class should be writing the same data in the two unused channels (i.e 0xFF would suffice for both).
#72
*Edit:
blog here www.garagegames.com/community/blogs/view/22776
07/05/2014 (4:46 am)
I have discovered some useful tips out of all this testing of the normal map formats. I am going to do a detailed blog post about it over the weekend and show results of testing all the different formats currently available. I have also added support for loading 3Dc/BC5 just for something extra to test.*Edit:
blog here www.garagegames.com/community/blogs/view/22776
#73
Good news-
Multi-Layered materials are back in.
Not so much-
Dynamic cubemaps'll take a bit more hunting around i.imgur.com/W0dVjgF.jpg
Similar issue though not cause to one I already threw on over github.com/GarageGames/Torque3D/pull/703 and rolled in our end just to rule that out as a recurrence of the problem (though I'm told there may be a more intelligent way of handling that than shutting off instancing entirely. Might be related to the present issue.
PR that flipped them over to semi-functional for this project: github.com/andr3wmac/Torque3D/commit/97064bdaa78fafc1fb4eae9550a131518e96eb1b for at least some of the relevant files. Going to need to figure out where she's going ahead and cloning textures instead of creating fresh angles per-object.
07/05/2014 (8:58 am)
To wrap up my end for the week, bit of good news/bad news:Good news-
Multi-Layered materials are back in.
Not so much-
Dynamic cubemaps'll take a bit more hunting around i.imgur.com/W0dVjgF.jpg
Similar issue though not cause to one I already threw on over github.com/GarageGames/Torque3D/pull/703 and rolled in our end just to rule that out as a recurrence of the problem (though I'm told there may be a more intelligent way of handling that than shutting off instancing entirely. Might be related to the present issue.
PR that flipped them over to semi-functional for this project: github.com/andr3wmac/Torque3D/commit/97064bdaa78fafc1fb4eae9550a131518e96eb1b for at least some of the relevant files. Going to need to figure out where she's going ahead and cloning textures instead of creating fresh angles per-object.
#74
@Timmy
@Andrew
Do you guys have access to the ShaderX7 Book?
Shaderx7 At Amazon. This book was written by Wolfgang Engel, which Michael Perry pointed out Torque3D's lighting is based on his earlier work.
I ask as in Shaderx7 Wolfgang points to the benefits of using Cook-Torrance shader model over the earlier Blinn-Phong shader model, this are in chapter 2.5, An Efficient and Physically Plausible Real-Time Shading Model. From what I'm reading what you have done and looks as if the older Blinn-Phong model is being used, am I reading this currently? Or are you just working on a small section of the Shader Model?
I've posted another topic about Blinn-Phong shader model and Cook-Torrance shader model, hence my question.
Not trying to upset anyone just asking.
07/12/2014 (9:16 pm)
@Azaezl@Timmy
@Andrew
Do you guys have access to the ShaderX7 Book?
Shaderx7 At Amazon. This book was written by Wolfgang Engel, which Michael Perry pointed out Torque3D's lighting is based on his earlier work.
I ask as in Shaderx7 Wolfgang points to the benefits of using Cook-Torrance shader model over the earlier Blinn-Phong shader model, this are in chapter 2.5, An Efficient and Physically Plausible Real-Time Shading Model. From what I'm reading what you have done and looks as if the older Blinn-Phong model is being used, am I reading this currently? Or are you just working on a small section of the Shader Model?
I've posted another topic about Blinn-Phong shader model and Cook-Torrance shader model, hence my question.
Not trying to upset anyone just asking.
#75
Deferred rendering using Compute shader
The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading page with links to code
07/12/2014 (9:29 pm)
Something that maybe of use;Deferred rendering using Compute shader
The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading page with links to code
#76
Are you using Deferred Shading with Multisampling Anti-aliasing for directx10 or an older version of Deferred Shading designed for Directx9 and OpenGL 2.1?
Also are you using Centroid-Based Edge detection?
07/12/2014 (9:59 pm)
Which method are you guys working with?Are you using Deferred Shading with Multisampling Anti-aliasing for directx10 or an older version of Deferred Shading designed for Directx9 and OpenGL 2.1?
Also are you using Centroid-Based Edge detection?
#77
07/12/2014 (11:26 pm)
Late so I'll keep it brief.Quote:A modified version, yes. The lions share of that aspect is contained within github.com/andr3wmac/Torque3D/tree/deferred_shading/My%20Projects/testDS/game/sh... Review the actual code and changes for specifics.
From what I'm reading what you have done and looks as if the older Blinn-Phong model is being used, am I reading this currently?
Quote:Are you using Deferred Shading with Multisampling Anti-aliasing for directx10 or an older version of Deferred Shading designed for Directx9 and OpenGL 2.1?Until such time as the directx refactor and the opengl branches are completed, it would be foolish in the extreme to design around models that explicitly rely on that code, so sticking with the capabilities allowed by DX9 for now. (Though in the case of the opengl work, porting *should* be relatively trivial.)
#79
07/13/2014 (3:07 pm)
I was just a bit curious, what kind of buffers are you guys looking to have with this setup? Will there be one for specular power/roughness for instance? What about a velocity buffer?
#80
My guess it will be for power, not roughness. Cook-Torrance Shader model changes power to roughness, and would be used when PBR code is done.
07/13/2014 (3:59 pm)
@FelixMy guess it will be for power, not roughness. Cook-Torrance Shader model changes power to roughness, and would be used when PBR code is done.
Timmy01