[fix] Adding support to shader 2.0b and 2.0a - LOGGED
by Manoel Neto · in Torque 3D Professional · 06/04/2010 (8:27 am) · 12 replies
An easy way to get a performance boost on integrated GPUs is using compiling the shaders for a lower target than the supported one. Example: a GeForce 7100 mobile supports SM3.0, but you'll get double or even triple the framerate if you drop to SM2.0 (in Basic Lighting). The problem is that most T3D shaders go over the instruction limit for SM2.0 (64 ALUs and 32 texture operations).
Everything would be fine if we could use SM2.0a or SM2.0b which all DX9c cards support and have a 512 instruction limit (enough for anything T3D does), however T3D doesn't know how to build the proper shader profile strings for those:
In gfxD3D9Shader.cpp, in GFXD3D9Shader::_init(), after this block:
..add this:
This will add support for forcing the shader version to 2.1 (2.0a) and 2.2 (2.0b):
Now, if you T3D to be able to detect true SM2.0a SM2.0b cards, open gfxPCD3D9Device.cpp, find the method GFXPCD3D9Device::init():
After this block:
Add this:
The only cards which don't support shader 3.0 but support shader 2.0a are the GeForce FX line (NV30), and the only ones which support shader 2.0b are the Radeon X700 and X800 (R400). Anything else will fail to compile the shaders (example: Radeon X600, Intel 945).
All shader 3.0 and 4.0 cards (even Intel ones) support shader 2.0b and 2.0a as well.
Everything would be fine if we could use SM2.0a or SM2.0b which all DX9c cards support and have a 512 instruction limit (enough for anything T3D does), however T3D doesn't know how to build the proper shader profile strings for those:
In gfxD3D9Shader.cpp, in GFXD3D9Shader::_init(), after this block:
String vertTarget = String::ToString("vs_%d_%d", mjVer, mnVer);
String pixTarget = String::ToString("ps_%d_%d", mjVer, mnVer);..add this:
if (mjVer == 2 && mnVer == 1)
{
pixTarget = "ps_2_a";
vertTarget = "vs_2_0";
}
else if (mjVer == 2 && mnVer == 2)
{
pixTarget = "ps_2_b";
vertTarget = "vs_2_0";
}This will add support for forcing the shader version to 2.1 (2.0a) and 2.2 (2.0b):
$pref::video::forcedPixVersion = 2.2; // Forces shader 2.0b $pref::video::forcePixVersion = 1;
Now, if you T3D to be able to detect true SM2.0a SM2.0b cards, open gfxPCD3D9Device.cpp, find the method GFXPCD3D9Device::init():
After this block:
U8 *pxPtr = (U8*) &caps.PixelShaderVersion; mPixVersion = pxPtr[1] + pxPtr[0] * 0.1;
Add this:
if (mPixVersion >= 2.0 && mPixVersion < 3.0 && caps.PS20Caps.NumTemps >= 32) mPixVersion += 0.2; else if (mPixVersion >= 2.0 && mPixVersion < 3.0 && caps.PS20Caps.NumTemps >= 22) mPixVersion += 0.1;
The only cards which don't support shader 3.0 but support shader 2.0a are the GeForce FX line (NV30), and the only ones which support shader 2.0b are the Radeon X700 and X800 (R400). Anything else will fail to compile the shaders (example: Radeon X600, Intel 945).
All shader 3.0 and 4.0 cards (even Intel ones) support shader 2.0b and 2.0a as well.
About the author
Recent Threads
#2
06/04/2010 (11:21 am)
I've been also thinking about forcing maximum shader version per-Material. In extremely simple materials (eg: no normal mapping, emissive or lightmapped) it's likely they'd look the same running in plain SM2.0. If you have, say, a huge lightmapped structure that the player walks inside (and thus occupies most of the pixels on screen) this would bring huge savings without having to drop all materials down to 2.0.
#3
08/21/2010 (10:18 am)
Logged as TQA-894.
#4
I've heard that using lower shader versions could improve performance... a doubling of performance is surprising.
If we had a valid strategy for determining the instruction count on a shader before compile we could force lower versions if they fit and don't use SM 3.x features. Any ideas on that would be appreciated.
One thing to worry about is floating point accuracy for prepass shaders. If the prepass is SM 3.0, but the forward pass is SM 2.1 i suspect we will get z fighting as the depths wont exactly match up. Could be that we need to avoid having different SMs between prepass and the forward pass.
08/31/2010 (7:39 pm)
Wow... i only just saw this thread now.I've heard that using lower shader versions could improve performance... a doubling of performance is surprising.
If we had a valid strategy for determining the instruction count on a shader before compile we could force lower versions if they fit and don't use SM 3.x features. Any ideas on that would be appreciated.
One thing to worry about is floating point accuracy for prepass shaders. If the prepass is SM 3.0, but the forward pass is SM 2.1 i suspect we will get z fighting as the depths wont exactly match up. Could be that we need to avoid having different SMs between prepass and the forward pass.
#5
- No flow control.
- No VPOS
... other than that most of our SM 3.0 shaders should compile down to SM 2a/b without trouble.
We're looking at the next version of shadergen/materials now and in that system we plan to precompile all shaders at build time and not generate them at runtime. This could allow me to build a shader from lowest to highest shader model and pick the lowest working version with the least instructions.
08/31/2010 (8:02 pm)
A few things that will hurt us if we want to compile things down to PS_2_a/b...- No flow control.
- No VPOS
... other than that most of our SM 3.0 shaders should compile down to SM 2a/b without trouble.
We're looking at the next version of shadergen/materials now and in that system we plan to precompile all shaders at build time and not generate them at runtime. This could allow me to build a shader from lowest to highest shader model and pick the lowest working version with the least instructions.
#6
08/31/2010 (8:51 pm)
I think Basic Lighting is where there are more to be gained from using the lowest possible SM, since it relies much more on ALU power.
#7
08/31/2010 (9:22 pm)
Well i do wonder if we get some benifit from going to 2a/b on things like shadow rendering shaders and other smaller shaders. I have to do some tests.
#8
09/10/2010 (6:35 pm)
Another thing to note... NVidia doesn't recommend this going forward...Quote:With the advent of GeForce 8 series with its unified shader architecture, and more work towards better and more optimized compilers/drivers, it is no longer necessary to pick a low shader model.
#9
Maybe this is the point to focus on for DX10 cards:
09/10/2010 (6:54 pm)
Interesting. I goy myself an ION which is a great sample of "modern low end", so I'll try doing some tests on it. The cases where I saw huge performance benefits were in DX9.0c cards, but I never tested this in DX10 cards.Maybe this is the point to focus on for DX10 cards:
Quote:
Another factor that affects both performance and quality is the precision used
for operations and registers. The GeForce Series GPUs support 32-bit and 16-
bit floating point formats (called float and half, respectively). The float
data type is very IEEE-like, with an s23e8 format. The half is also IEEE-
like, in an s10e5 format.
#10
09/10/2010 (8:21 pm)
Yea... i've done a few fixes to gbuffer encoding/decoding to use half for normals, but i haven't really applied that to any other shaders.
#12
09/10/2010 (8:27 pm)
BTW, I get 20 fps in Deathball with Advanced Lighting and PSSM shadows, at 1366x768 in the ION, which is quite remarkable for a deferred renderer running on a netbook.
Associate Michael Hall
Distracted...