Game Development Community

Shouldn't SwapBuffers be fast?

by Andy Schatz · in Torque 3D Professional · 06/25/2009 (5:47 pm) · 28 replies

Am I missing something in how I should be optimizing torque? I'm currently testing on my mid-range laptop (P3 2.5GHz, GeForce 8400), and in Basic lighting I only get 13.3 fps. But when I run a profile, it spends 90% of its time in SwapBuffers, and no significant time in any other single function. Is the profiler actually missing something? Shouldn't SwapBuffers be the sort of thing that can be performed at like 300 Hz?
Page «Previous 1 2
#1
06/25/2009 (6:14 pm)
Any chance for the real specs?
P3 2.5 ghz does not exist especially not in conjunction to a GF8400 :)

Other than that: there shouldn't be a reason it takes that long unless:

1. You enforce AA (be it through prefs or drivers)
2. have it massively overdone with the resolution or amount of reflective surfaces
#2
06/25/2009 (8:09 pm)
SwapBuffers is where, if the GPU is running behind the CPU, things catch up. So if you see a big spike there it means that the GPU is not able to render your scene fast enough and the CPU is having to wait to catch up.
#3
06/25/2009 (8:16 pm)
Yea... Ben beat me to this. :)

If you see SwapBuffers take alot of time... your either fillrate or vertex bound on the GPU. Most likely fillrate bound.

Try deleting things from the scene one at a time and retiming. Or better yet give NvPerfHud a spin as it can give you a fillrate cost analysis... a sweet tool.

#4
06/26/2009 (6:11 am)
Just to be safe: did you download the latest drivers from NVidia? 3D performance can often be terrible if you're using the Microsoft supplied drivers and sometimes even the ones that came with your laptop.
#5
06/26/2009 (9:51 am)
@Marc 2.49Ghz sry!

@GG People - this would be a nice comment to insert, I saw something like this elsewhere in the code -- something to the effect of "If this comes up big in your profiler its because your CPU is waiting on your GPU. You are likely either Vertex or fillrate bound"
#6
06/26/2009 (10:07 am)
OK, I'm not actually expecting help here, it's not your guys job to optimize or debug for me, but I thought I'd report back with this:

All I've got left in my scene is a terrainblock, which has screenerror set to a goofy high of 256. At 1024x768 I still get <30 fps and Swapbuffers is still chewing up 75%. I've made sure all the options on my card are turned off and I've got the latest drivers. It's hard to imagine there should be a bottleneck anywhere given these conditions.
#7
06/26/2009 (11:05 am)
Trying out PerfHUD right now, man that's a neat tool! BTW, for anyone who comes across this thread in the future, to enable Torque to use perfHUD you've got to uncomment the line in core/main.cs $Video::useNVPerfHUD = true;

I start to write a bunch of things here and then I erase them as I think I find new wisdom. At first, it seems that the terrain is what's causing the slowdown. But then after using perfHUD for a while (in particular looking at the Frames screen) performance actually seems to improve significantly! (from <30 fps to varying between 40 and 60). Anyways, after playing with a bunch of stuff I've determined that I'm just clogging up VRAM with all my terrain detail textures.

So my question now is, if I use DDS for these guys, will it know to only send the lower mips to the card in order to improve performance? Or if not, it probably would be a good idea to have a slider on the options screen that acted as a multiplier for terrain detail render distance.
#8
06/26/2009 (11:07 am)
Yes, DDS can dramatically reduce VRAM usage. Using DXT1 will bring it to 1/8th of the original size.
#9
06/26/2009 (11:56 am)
DDS is the tool of choice for reducing VRAM usage. I encourage using it everywhere that it is appropriate, and note that the DXT5nm format is also supported, for normal maps. (Compress normal map in DXT5, put Y in Green, X in Alpha, set Red and Blue to 1.0.)

Mipmap behavior is independent of DDS or that slider. The slider will effect the detail draw distance...but which mipmap which is chosen for a texture sample is either decided by the card (see DDX/DDY instructions for more on this) or by the shader-author, if the author uses the 'tex2Dlod' instruction (or GLSL equivilent) to manually select the mipmap.

In the case of terrain detail, it will use high-resolution mip levels for the pixels closer to the camera, and lower-resolution mip levels for the pixels further away from the camera.
#10
06/26/2009 (12:28 pm)
I'm actually finding now that the main bottleneck is not the VRAM but the actual shader... when I reduce the size of my details, diffuse and normals to 128x128 (I've got 5 textures), I see no performance increase. When I delete the details textures entirely I see significant improvement (about 80% improvement). Though even after doing that, the bottleneck in NvPerfHUD is in RenderTerrainMgr_Render in the GPU. When I delete everything in the level except terrain, and I delete all of tmy terrain textures (though the materials still exist), I see only moderate improvement. Strangely enough, after fooling around in the tool, just as I noticed before, something gets flushed and my performance gets good (as it should be with literally no lights, textures, anything in the level except an untextured terrainblock).
#11
06/26/2009 (12:59 pm)
@Pat - so DDS_nm swizzling now works?
#12
06/26/2009 (1:02 pm)
It should...was it broken? The feature should be enabled by assigning a DDS texture, with DXT5 format, to the normal map. The code for expand/assign normal map in ShaderGen should do the rest.
#13
06/26/2009 (1:18 pm)
I'm sorry to say so Andy, but your system is a low end system, no midrange.
The 8300 (desktop counterpart to 8400M) is far from beeing fast, especially its fillrate is, thanks to the capped buswidth and the slow memory in combination with the cut down gpu, very low.
This card is technically on the paper capable to use DX10, but thats where it ends to be capable of, in the real world, it will have problems with any DX10 style technology if you try to push a lot of shaders and especially fullscreen multipass rendering through as it happens with T3D Advanced Lighting.
You might have a chance to get away if you disable any shadow casting and the complex water, but it might just as well bite you in your back sadly.
#14
06/26/2009 (1:46 pm)
@Marc- There's really no point to arguing whether it's low end or mi-range. The point is that I need to find some settings that will make the engine run well on this machine. I have plenty of users that are below these system specs, and I don't find "upgrade your system" to be an acceptable answer to these customers. I don't expect the game to look good on low-end, or even mid-level systems, I just expect to get 20 fps.

In particular I'm pitching right now to a group and the individual trying to look at the demo is on a 1 year old Mac Air with a 1.6Ghz cpu and a 256 MB graphics chip running bootcamp (not virtualized). Now THAT is low end. But if he can't run it at even 5fps, the game likely won't get funded by this group.

Our game is going to be out within a year, so I can't afford the luxury of believing that computers will be much faster by the time the game is out.

Again, I'm not saying GG isn't doing a good job, I'm just trying to find the bottlenecks and discover the settings that will allow the game to at least RUN on a low end machine.
#15
06/26/2009 (1:56 pm)
Andy, the increase you may be seeing is, if there are no detail textures, it may not render the detail pass on the terrain (I don't know if this is true or not). If this is the case, than you are seeing the speedup from not rendering the terrain geometry again.
#16
06/26/2009 (1:56 pm)
On intel GPUs advanced lighting will never work.
For the Intel Airbook, there are two answers:

1. Basic lighting in T3D (+ have a stronger machine with you to show what the game can do for gamers with advanced lighting)
2. Use TGEA instead which is more appropriate to lower end users.

My hope actually is that Basic Lighting becomes what it was once mentioned, a TGEA alike lighting level, not a DX7 class lighting level as it is right now which is even less atmospheric than TGE
#17
06/26/2009 (2:02 pm)
Yeah, I have him running in basic lighting and I've spent the last few days bugging Tom and the rest of you fine folks about performance issues to see if I can squeeze enough juice out of it to run on his machine.

TBH, I actually don't ever expect the game to really run well enough to actually play on that machine, I just want to get 5 fps so he can poke around with it. As of a few days ago we were at 1 fps.
#18
06/26/2009 (2:33 pm)
Yeah, 8400 == basic lighting. The fillrate on those is *very* low, and you really shouldn't be doing tons of blending and render-to-texture on those.

You might be able to use lowered AL settings when GG adds the extra shadow quality levels, however. Still need to be very careful on the post effects, amount of lights and shadow size.
#19
06/26/2009 (5:35 pm)
@Andy - If you do a search for GFXTextureFilterAnisotropic you'll find in terrCellMaterial.cpp that i'm using that by default on normal maps and detail maps for terrain. Try changing that to GFXTextureFilterLinear and see what it does for you.

Also the terrain isn't generating a base texture only shader for cells that are far enough away to not render detail textures. Thats an optimization i had not gotten to, but should help alot on larger terrains.
#20
04/06/2010 (9:42 pm)
I'm now noticing this problem as well. In T3D, Im looking at the profiler, and it looks like swapbuffers is taking anywhere from 50% to 75% of the frame time.

I only tested on the empty room mission, which means I shouldn't be GPU or fillrate bound at all.

On this same system, when I profile a mission in TGEA in windowed mode, i see swapbuffers takes about 1%-2%. In T3D, in BL mode, on the empty room mission, swapbuffers is taking about 55% frame time. it seems to me, regardless of the system, these two figures should be about the same.

Why is swapbuffers taking up such a huge % of frame time?
Page «Previous 1 2