TSE Performance
by Dan Partelly · in Torque Game Engine Advanced · 11/25/2005 (7:26 am) · 16 replies
Ok, im checking TSE for the first time, and im getting strange performance issues.
The TSE demo run with a lot of hickups, especially in scenes where the orc is dancing.
When it starts to transition to outside world, the FPS drops very fast. It finally
stabilize at something as 4FPS when the building becomes totally in view. Im using
a Nv 6610 on PciExpress , with latest drivers, the card is not the fastest one, but still , I
can render way more complex scenes (100k vertices ) with shadow maps and compex,
specular / normalmapped shaders and refections at least 40 FPS using other code.
Any ideeas whats going on ?
The TSE demo run with a lot of hickups, especially in scenes where the orc is dancing.
When it starts to transition to outside world, the FPS drops very fast. It finally
stabilize at something as 4FPS when the building becomes totally in view. Im using
a Nv 6610 on PciExpress , with latest drivers, the card is not the fastest one, but still , I
can render way more complex scenes (100k vertices ) with shadow maps and compex,
specular / normalmapped shaders and refections at least 40 FPS using other code.
Any ideeas whats going on ?
About the author
#2
But 6610 is causing probelms on Torque. Both with pefrofmance and glow buffer issues.
The performance of 4 fps is out of question, as I said I can render 100k+ triangles with pretty
complex shaders at 40 fps.
11/26/2005 (4:27 pm)
Yeah. 6610 is slow .. er. Well, I own a 6800 XT as well and a buch of Radeons, from 9800+ up. But 6610 is causing probelms on Torque. Both with pefrofmance and glow buffer issues.
The performance of 4 fps is out of question, as I said I can render 100k+ triangles with pretty
complex shaders at 40 fps.
#3
There is, in an upcomming update for TSE, some code that drastically increases frame rate. On Marble Blast Ultra (Xbox360) we went from ~70 fps to over 500. Right now the engine is doing a lot of state changes. Some of this is kept down by the state manager, but the main problem is that we are not submitting vertex buffers in a good sorted order, nor are we submitting them in good batches. This code that Brian Ramage wrote for optimzations fixes that.
11/26/2005 (4:32 pm)
Dan,There is, in an upcomming update for TSE, some code that drastically increases frame rate. On Marble Blast Ultra (Xbox360) we went from ~70 fps to over 500. Right now the engine is doing a lot of state changes. Some of this is kept down by the state manager, but the main problem is that we are not submitting vertex buffers in a good sorted order, nor are we submitting them in good batches. This code that Brian Ramage wrote for optimzations fixes that.
#4
11/26/2005 (4:43 pm)
Pat: Wow that sounds awesome! I can't wait to see what it does to my frames(already running at insane FPS).
#5
11/26/2005 (5:10 pm)
@Dan: You could try using forcing it to use ps2.0 instead of 3.0 and you should get better perfomance, since that card has bad 3.0 performance.
#6
That demo does not have 3.0 code. It has 2.0 with 1.x fallbacks. I have not heard of 6610? I thought that part was a typo.
6610 = 6600LE?? Look at the bottom of this page and there is a list of SM3 Nvidia GPU's. www.nvidia.com/object/powerof3.html
Oh and still let's look at the video init part of your console.log...
Where it says Video Init and below.
11/26/2005 (6:33 pm)
My reference to the performance was more of a general comment on performance in reguards to the 40fps etc.That demo does not have 3.0 code. It has 2.0 with 1.x fallbacks. I have not heard of 6610? I thought that part was a typo.
6610 = 6600LE?? Look at the bottom of this page and there is a list of SM3 Nvidia GPU's. www.nvidia.com/object/powerof3.html
Oh and still let's look at the video init part of your console.log...
Where it says Video Init and below.
Quote:Video Init:
DirectX version - 9.0c
Direct 3D device found
Cur. D3DDevice ref count=1
Pix version detected: 2.000000
Vert version detected: 2.000000
Initializing GFXCardProfiler (D3D9)
o Vendor : 'NVIDIA'
o Card : 'GeForce FX 5900'
o Version: '77.30'
- Scanning card capabilities...
GFXCardProfiler (D3D9) - Setting capability 'autoMipMapLevel' to true.
- Loading card profiles...
- Loading card profile profile/D3D9.cs
- Loading card profile profile/D3D9.NVIDIA.cs
- No card profile profile/D3D9.NVIDIA.GeForceFX5900.cs exists
- No card profile profile/D3D9.NVIDIA.GeForceFX5900.7730.cs exists
Loading compiled script common/ui/defaultProfiles.cs.
Missing file: common/ui/windowsDefaultProfiles.cs!
Loading compiled script common/ui/ConsoleDlg.gui.
Texture Manager
- Approx. Available VRAM: 134217728
- Threshold VRAM: 68157440
- Quality mode: high
#7
Pat's right - big improvements coming down the pipe, and the MBU work is letting us really beat the snot out of the tech.
11/26/2005 (9:23 pm)
It would be super useful to see a profiler dump and maybe do some PIX digging on that card. The profiler will reveal if a specific operation on the engine side is causing a slowdown, and PIX can show if a specific set of API calls are eating time (look for big gaps on the timeline).Pat's right - big improvements coming down the pipe, and the MBU work is letting us really beat the snot out of the tech.
#8
I have no ideea if 6610 is a 6600 LE. I think this is a chip which Nvidia offers only
for "OEM" and not to be sold separatly by full systems.
Cur. D3DDevice ref count=1
Pix version detected: 3.000000
Vert version detected: 3.000000
Initializing GFXCardProfiler (D3D9)
o Vendor : 'NVIDIA'
o Card : 'GeForce 6610 XL'
o Version: '81.95'
- Scanning card capabilities...
GFXCardProfiler (D3D9) - Setting capability 'autoMipMapLevel' to true.
- Loading card profiles...
- Loading card profile profile/D3D9.cs
- Loading card profile profile/D3D9.NVIDIA.cs
- No card profile profile/D3D9.NVIDIA.GeForce6610XL.cs exists
- No card profile profile/D3D9.NVIDIA.GeForce6610XL.8195.cs exists
Loading compiled script common/ui/defaultProfiles.cs.
Missing file: common/ui/windowsDefaultProfiles.cs!
Loading compiled script common/ui/ConsoleDlg.gui.
Texture Manager
- Approx. Available VRAM: 134217728
- Threshold VRAM: 33554432
- Quality mode: high
11/27/2005 (1:45 am)
Randy,I have no ideea if 6610 is a 6600 LE. I think this is a chip which Nvidia offers only
for "OEM" and not to be sold separatly by full systems.
Cur. D3DDevice ref count=1
Pix version detected: 3.000000
Vert version detected: 3.000000
Initializing GFXCardProfiler (D3D9)
o Vendor : 'NVIDIA'
o Card : 'GeForce 6610 XL'
o Version: '81.95'
- Scanning card capabilities...
GFXCardProfiler (D3D9) - Setting capability 'autoMipMapLevel' to true.
- Loading card profiles...
- Loading card profile profile/D3D9.cs
- Loading card profile profile/D3D9.NVIDIA.cs
- No card profile profile/D3D9.NVIDIA.GeForce6610XL.cs exists
- No card profile profile/D3D9.NVIDIA.GeForce6610XL.8195.cs exists
Loading compiled script common/ui/defaultProfiles.cs.
Missing file: common/ui/windowsDefaultProfiles.cs!
Loading compiled script common/ui/ConsoleDlg.gui.
Texture Manager
- Approx. Available VRAM: 134217728
- Threshold VRAM: 33554432
- Quality mode: high
#9
drop is not so severe in all cases, the effect seems random at different runs of TSE demo.
It's interesting.
>> nor are we submitting them in good batches
Well, thats DirectX. DX is still much more sensitive then OpenGL to the number of
batches due to the way it constructs the buffers internaly. But is rumored that
DirrectX 10 will change all this, thanks allmighty.
Good news about the improvments.
Dan
11/27/2005 (1:56 am)
Ok, Ill get the code in profiler and see whats happening. As a side note, the performance drop is not so severe in all cases, the effect seems random at different runs of TSE demo.
It's interesting.
>> nor are we submitting them in good batches
Well, thats DirectX. DX is still much more sensitive then OpenGL to the number of
batches due to the way it constructs the buffers internaly. But is rumored that
DirrectX 10 will change all this, thanks allmighty.
Good news about the improvments.
Dan
#10
128 Meg 6610 XL. Never seen that one before.
Uh let's see if this performance fix helps on next the update.
11/27/2005 (3:44 am)
Yea you should get at least 20+ FPS in that scene. Looks like your hardware is a'ok.128 Meg 6610 XL. Never seen that one before.
Uh let's see if this performance fix helps on next the update.
#11
I run a very complex scene with stecil shadows (for dynamic objects)
and shadow maps, and all objects are lit by shaders which are slightly
more complex then Doom3 interaction shaders. So im severly fill rate limited.
Ok, We will see whats heppening, I also have problems with glow buffer
in full screen, works OK in window.
11/27/2005 (1:15 pm)
20 FPS ? I should get over 70 me thinks. When I said about 40 FPS earlier ,I run a very complex scene with stecil shadows (for dynamic objects)
and shadow maps, and all objects are lit by shaders which are slightly
more complex then Doom3 interaction shaders. So im severly fill rate limited.
Ok, We will see whats heppening, I also have problems with glow buffer
in full screen, works OK in window.
#12
12/07/2005 (11:49 am)
So how long till this "update"?
#13
Trust me you don't want half baked updates. It will be worth the wait.
12/07/2005 (11:57 am)
I think they will need to ship MBU360 first then polish up for update.Trust me you don't want half baked updates. It will be worth the wait.
#14
TSMesh::render() does a ton of redundant state change requests. Each time it sets the cull mode, blends, color ops, texture address modes, sunlight, fog, glow info, and more. This is all state that is the same across all TSMesh objects being rendered in the scene in one frame. When looking at the dancing Orc from the end of that hallway in the TSE demo it calls TSMesh::render() 36 times per frame. Considering that each one of these calls makes dozens of redundant state changes your looking at potentially hundreds of extra state change calls per frame.
IMO the TSMesh API should be changed to something like this:
Although the state manager should filter these changes from getting to the DX hardware, it's still alot of code to be calling to do essentially nothing on most calls.
01/23/2006 (1:13 am)
GG probably already realizes this (and it may be part of what Brain fixed), but i'll post it anyway.TSMesh::render() does a ton of redundant state change requests. Each time it sets the cull mode, blends, color ops, texture address modes, sunlight, fog, glow info, and more. This is all state that is the same across all TSMesh objects being rendered in the scene in one frame. When looking at the dancing Orc from the end of that hallway in the TSE demo it calls TSMesh::render() 36 times per frame. Considering that each one of these calls makes dozens of redundant state changes your looking at potentially hundreds of extra state change calls per frame.
IMO the TSMesh API should be changed to something like this:
TSMesh::beginRender(); // common state set here for (i=start; i<end; i++) mTSMeshObjects[i].render(); TSMesh::endRender();
Although the state manager should filter these changes from getting to the DX hardware, it's still alot of code to be calling to do essentially nothing on most calls.
#15
01/23/2006 (5:46 pm)
Yes, we are doing hundreds more changes than are necessary. The batching solution will address this problem.
#16
- 819 DIP calls.
- 139 SetRenderState calls.
- 359 SetPixelShader and SetVertexShader calls.
- 1718 SetTexture calls.
This is all in one frame. Overall it reports 8056 D3D calls per frame at this angle... pretty bad stuff. Can't wait to see the batching stuff in place.
01/23/2006 (6:45 pm)
I did a pix run by the way. Looking at one angle in the default demo i see the following suspect stuff:- 819 DIP calls.
- 139 SetRenderState calls.
- 359 SetPixelShader and SetVertexShader calls.
- 1718 SetTexture calls.
This is all in one frame. Overall it reports 8056 D3D calls per frame at this angle... pretty bad stuff. Can't wait to see the batching stuff in place.
Torque Owner Vashner
Let's peek at the video init portion of your console.log