Game Development Community

Need help with performance

by Hewster · in Torque Game Engine Advanced · 09/16/2006 (8:39 am) · 34 replies

Hi,

Have just finished porting a map from TLK1.4 to current TSE ms4

in TLK I get around 104 fps, in TSE I'm seeing just 0.5 to 0.8fps !

Note this is done with un-modified engine code too.

Geomarty is identical, textures are identical, terrain is identical... WTF is going on :(
Material wise in TSE I've simply created basic materials, with no shader passes, like so:

new Material(building_a)
{
	baseTex[0]		= "building_a";
};


TGE:
www.the-wildwest.co.uk/forum_stuff/torque/town_tge_performance01.jpg
TSE:
www.the-wildwest.co.uk/forum_stuff/torque/town_tse_performance01.jpg

Anyone got any suggestions as to what is going on ?

Thanks in advance

Hewster
Page «Previous 1 2
#1
09/16/2006 (9:29 am)
Try removing the terrain just temporarily. There seems to be performance issues with it and dynamic lights.

Edit: Also, you didnt list your machine specs so its kind of a guess that the above might help.
#2
09/16/2006 (9:45 am)
Hi Stefan, thanks for the reply,

Will give that a try (its legacy terrain btw)
also there are no lights what so ever, only the sun light.

My basic machine specs are:

AMD 3200XP
1GB DDR Ram
Geforce 6800GT (128Mb) (91.31 drivers)
Windows XP pro
#3
09/16/2006 (9:49 am)
OK that's definatly not normal.

Can you update the thread when you've tried removing the terrain? Thanks.
#4
09/16/2006 (10:22 am)
Ok, tried removing the terrain, fps goes up to around 1 - 1.5

Then I tried re-texturing all the models (six in total) with a single 256x256 texture...
and now see 120fps... so it seems to be an issue with the amount of texture memory
I'm using.

Brian / Ben Is this to be expected ?

If so Its a bit of a blow for me, was really hoping to use TSE for my game,
but if it cannot handle the same amount of textures as TGE then well :(

Hewster
#5
09/16/2006 (10:42 am)
Sounds like a bug. Are any of your textures non-standard size? Maybe a non-power 2 issue? I have a 6800 also and I usually get 100-200 fps depending on the scene.
#6
09/16/2006 (11:31 am)
Hi JC, yep all textures are under 512x512 and all the power of 2.

I'm gona try slowly increasing the amount of textures in the scene and see
what results I get... will post my results.

BTW, does anyone know any other "metrics" ?

In the engine I'm used to (RtCW / Quake3) its easy to display
tris count, tex mem used ect, just wondering if TSE has similar
"debug" options.
#7
09/16/2006 (12:21 pm)
I use 512x512 textures and around 14 unique ones for one of our interiors, with bumpmaps and sometimes glow. Running on a 6800, Fps is between 190-333.

So it sounds like a bug.
#8
09/16/2006 (1:32 pm)
Thanks for the responces guys, its really apprciated :)

First off, whats the memory of your 6800's ? mines 128Mb (poor mans 6800 lol)

Now, I've done some tests with various amounts of textures within the map,
here are my results:

These readings are taken as follows:
Game starts in Windowed mode with player in the middle
of the map, walk around the map, and goto the start
position (as shown in the screenies in my first post),
then read FPS.

Next goto fullscreen (ALT + ENTER), walk around the
map, and return to start position: read FPS

Go back to windowed mode (ALT + ENTER), walk around
the map, and return to start: read FPS


10 textures of 512 x 512
1 texture of 256 x 256

Start windowed = 25fps
goto fullscreen = 115fps
back to windowed = 25fps

===================================

9 of 512 x 512
1 of 256 x 256

Start windowed = 20fps
goto fullscreen = 120fps
back to windowed = 110fps

===================================

8 of 512 x 512
1 of 256 x 256

Start windowed = 100fps
goto fullscreen = 120fps
back to windowed = 110fps

===================================

All tex with unique bump & spec lighting
8 of 512 x 512
1 of 256 x 256

Start windowed = 25fps
goto fullscreen = 90fps
back to windowed = 25fps


I havent tried, but will guess that 7 of 512 x 512 texures bumped is the limit
for my machine to ensure good, stable fps.


Would really like to hear from Ben or Brian, am prepared to package up my test
projects for others to try on their systems too... basically I really want TSE
to work for me... been waiting soooo long :)

Hewster
#9
09/16/2006 (9:21 pm)
256 megs here on the 6800. What format are you using on your textures? If your using PNG, TGA or something similar then it's going to eat through the video memory quickly. You might want to try saving them as a compressed DDS if you are.
#10
09/16/2006 (9:35 pm)
Turn on the profiler and see where your time is going. It could be something related to graphics, or could be just something dumb that's eating a ton of time.

profilerEnable(true);
// close console and wait a while
profilerDump();

The output from that should indicate what's up - you can paste it here for us to look at.

Do you have latest drivers? Are you running a release build of TSE? The profiler should give us something to go on in troubleshooting.
#11
09/16/2006 (11:51 pm)
@J.C does it matter what format the original textures are? I thought whatever format you
use would be converted internally to either a DX compressed texture, or "raw".
by "raw" I mean for instance a 512 x 512 32bit tex would use 512 x 512 x 32 bits of memory
(= 8399608 bits = 8192Kb = 8Mb) I'm using .jpg btw.

Perhaps Ben can tell us :)

@Ben, thanks for the profiler tip. My current build of TSE is based off current HEAD CVS,
and is a release compile. My GFX drivers are the latest from Nvidia.

Don't really know what I'm looking at with the dump :/ a quick "how too"
would be great... please :)

EDIT: AHHHH had to split post because the forum doesn't like more than 8192 characters
(spooky there is that number again lol) Can someone please do something about this ?
I've just spent nearly 10 minutes fucking about working out how much of the dump I can put
in a single post... NOT FUNNY !

Please see next post for my dump, which was made using the original map (1fps)

Cheers

Hewster
#12
09/16/2006 (11:57 pm)
Ordered by stack trace total time -
% Time  % NSTime  Invoke #  Name
  0.151 -99.849        0 ROOT
100.000   0.016    12319   MainLoop
 98.338   0.175    12319     TimeManagerProcessMain
 98.163   0.050    12319       ProcessTimeEvent
 96.912   0.019    12319         RenderFrame
 79.781   5.688    12319           CanvasRenderControls
 64.412   0.027    12319             GameRenderWorld
 64.384   0.077    12319               SceneGraphRender
 60.598  51.648    12319                 TraverseScene
  7.527   0.062    12319                   TerrainRender
  5.813   0.072    12319                     TerrainRenderEmit
  4.331   0.303   753177                       TerrainRenderStep
  3.712   0.247  2919413                         TerrainRenderStepChunk
  3.213   0.185  2919413                           TerrainRender_renderChunkNormal
  2.151   0.565  2919413                             TR_chunkFormer
  0.645   0.645  2919413                               TR_center
  0.266   0.172  2919413                               TR_e3fix
  0.094   0.094  2919413                                 TR_fixEdge
  0.263   0.164  2919413                               TR_e2fix
  0.099   0.099  2919413                                 TR_fixEdge
  0.225   0.225  2919413                               TR_corners
  0.099   0.088  2919413                               TR_e0fix
  0.011   0.011   282979                                 TR_fixEdge
  0.088   0.077  2919413                               TR_e1fix
  0.012   0.012   209508                                 TR_fixEdge
  0.877   0.119  2919413                             TR_chunkLatter
  0.610   0.610   366382                               TR_levelMinusOne
  0.133   0.133  2540408                               TR_subDivLvl1Plus
  0.015   0.015    12623                               TR_subDivLvl0
  0.243   0.222   172482                           MemoryRealloc
  0.012   0.012    87812                             MemoryFree
  0.005   0.005   175624                             Mutex_lockMutex
  0.004   0.004   175624                             Mutex_unlockMutex
  0.006   0.006   184801                           Mutex_lockMutex
  0.004   0.004   184801                           Mutex_unlockMutex
  0.316   0.163  2919413                         Terrain_batchEnd
  0.153   0.153  2919413                           Terrain_TexGen
  1.306   0.047    12319                       Terrain_batchEnd
  0.921   0.921    12319                         TRender_DIP
  0.336   0.256    12319                         Terrain_bufferCopy
  0.036   0.036    12319                           Terrain_bufferCopy_lockP
  0.035   0.035    12319                           Terrain_bufferCopy_lockV
  0.005   0.005    12319                           Terrain_bufferCopy_unlockP
  0.004   0.004    12319                           Terrain_bufferCopy_unlockV
  0.001   0.001    36957                         Mutex_lockMutex
  0.001   0.001    36957                         Mutex_unlockMutex
  0.068   0.068    12319                       AllocatedTexture_trimFreeList
  0.000   0.000      119                         AllocatedTexture_freeTex
  0.000   0.000      119                           MemoryFree
  0.000   0.000      119                           Mutex_lockMutex
  0.000   0.000      119                           Mutex_unlockMutex
  0.036   0.036   765496                       TerrTexture_getNextFrameTex
  1.556   1.337    12319                     TerrainRenderRecurse
  0.142   0.142  3577691                       TerrainRender_testSquareVisibility
  0.077   0.050   753177                       AllocatedTexture_alloc
  0.027   0.027      122                         TerrainTextureBuildBlendMap
  0.000   0.000     1098                           MemoryFree
  0.000   0.000     2565                           Mutex_lockMutex
  0.000   0.000     2565                           Mutex_unlockMutex
  0.000   0.000      122                         Mutex_lockMutex
  0.000   0.000      122                         Mutex_unlockMutex
  0.096   0.078    12319                     TerrainRenderSetup
  0.017   0.017    12319                       TerrainRender_buildClippingPlanes
  0.504   0.499    12319                   sgShadowProjector_sgRender
  0.003   0.003    21439                     MemoryFree
  0.002   0.002    25999                     Mutex_lockMutex
  0.001   0.001    25999                     Mutex_unlockMutex
  0.430   0.413    12319                   SkyRender
  0.007   0.007   271018                     Mutex_lockMutex
  0.005   0.005   271018                     Mutex_unlockMutex
  0.004   0.004   135509                     MemoryFree
  0.350   0.342    12319                   Glow
  0.003   0.003    36957                     MemoryFree
  0.003   0.003    73914                     Mutex_lockMutex
  0.002   0.002    73914                     Mutex_unlockMutex
  0.055   0.055    12319                   RIM_sort
  0.038   0.009    12319                   sgShadowProjector_sgCalculateBoundingBox
  0.024   0.024      282                     sgShadowProjector_sgCalculateBoundingBox_Part
  0.000   0.000      282                       MemoryFree
  0.000   0.000      564                       Mutex_lockMutex
  0.000   0.000      564                       Mutex_unlockMutex
  0.005   0.005      282                     sgShadowProjector_sgCalculateBoundingBox_find
  0.028   0.028   194994                   MemoryFree
  0.010   0.010   271358                   Mutex_lockMutex
  0.006   0.006   271358                   Mutex_unlockMutex
  0.002   0.002    12319                   GetLightingAmbientColor
  2.747   0.425    12319                 BuildSceneTree
  2.183   1.086    24638                   TSShapeInstanceRender
  1.012   0.695    12319                     UpdateSkin
  0.316   0.316    12319                       CreateVBIB
  0.029   0.029  1428372                     Mutex_lockMutex
  0.026   0.026   665226                     MemoryFree
  0.023   0.023  1428372                     Mutex_unlockMutex
  0.004   0.004    24638                     TSShapeInstanceMaterials
  0.002   0.002    24638                     TSShapeInstanceRenderBillboards
  0.089   0.087    12319                   ContainerFindObjects
  0.001   0.001    12319                     Mutex_lockMutex
  0.001   0.001    12319                     Mutex_unlockMutex
  0.020   0.020    12319                   ContainerCastRay
  0.019   0.010    24638                   GetLightingAmbientColor
  0.009   0.009     3283                     ContainerCastRay
  0.007   0.005     1760                   InteriorPrepRenderImage
  0.002   0.002     1760                     InteriorZoneTraversal
  0.000   0.000     1760                       InteriorTraverseZone
  0.000   0.000      955                     Mutex_unlockMutex
  0.000   0.000      955                     Mutex_lockMutex
  0.002   0.002    36957                   Mutex_lockMutex
  0.001   0.001    12319                   MemoryFree
  0.001   0.001    36957                   Mutex_unlockMutex
  0.932   0.932    12319                 BuildFogTexture
  0.018   0.018    12319                 RegisterLights
  0.005   0.005    24961                 MemoryFree
  0.002   0.002    12319                 UnregisterLights
  0.002   0.002    49599                 Mutex_lockMutex
  0.001   0.001    49599                 Mutex_unlockMutex
  0.001   0.001    12319                 SG_FindZone
  0.002   0.002    12319               Mutex_lockMutex
  0.000   0.000    12319               Mutex_unlockMutex
  7.869   7.228   553213             GFXDevice_drawTextN
  0.568   0.568 41749234               NewFontGetCharInfo
  0.032   0.032  1157647               Mutex_lockMutex
  0.022   0.022   604434               MemoryFree
  0.019   0.019  1157647               Mutex_unlockMutex
  1.469   1.029   580887             convertUTF8toUTF16
  0.441   0.441 26132149               oneUTF8toUTF32
#13
09/16/2006 (11:57 pm)
0.121   0.121    12319             ContainerCastRay
  0.053   0.053  1906917             Mutex_lockMutex
  0.052   0.042    12319             GFXBeginScene
  0.010   0.010    12319               GuiCanvas_updateReflections
  0.036   0.036  1906917             Mutex_unlockMutex
  0.036   0.024    19524             convertUTF16toUTF8
  0.006   0.006   318780               oneUTF16toUTF32
  0.005   0.005   318780               oneUTF32toUTF8
  0.033   0.033   913949             MemoryFree
  0.008   0.006    15478             MemoryRealloc
  0.001   0.001    21768               Mutex_lockMutex
  0.001   0.001    10884               MemoryFree
  0.001   0.001    21768               Mutex_unlockMutex
  0.004   0.004   294112             NewFontGetCharInfo
 16.994  16.994    12319           SwapBuffers
  0.111   0.100    12319           CanvasPreRender
  0.010   0.010    31261             NewFontGetCharInfo
  0.000   0.000        2               Mutex_lockMutex
  0.000   0.000        2               Mutex_unlockMutex
  0.002   0.001      591             convertUTF8toUTF16
  0.001   0.001    31261               oneUTF8toUTF32
  0.000   0.000       15             Mutex_lockMutex
  0.000   0.000       15             Mutex_unlockMutex
  0.007   0.007    12319           GFXEndScene
  0.900   0.141    12319         ClientProcess
  0.756   0.610    12319           AdvanceClientTime
  0.072   0.010     6001             AdvanceObjects
  0.063   0.002     6001               Player_ProcessTick
  0.060   0.038     6001                 Player_PhysicsSection
  0.022   0.006     6001                   Player_UpdatePos
  0.013   0.004     6001                     SceneObjectSetTransform
  0.003   0.001     6001                       SG_ZoneInsert
  0.002   0.002     6001                         SG_Rezone
  0.003   0.003     6001                       CheckBins
  0.000   0.000       20                         InsertBins
  0.000   0.000       20                         RemoveFromBins
  0.001   0.001     6001                       SceneObj_setRenderTransform
  0.001   0.001     6001                       SG_ZoneRemove
  0.003   0.003     6001                     ContainerFindObjects
  0.000   0.000       40                     MemoryFree
  0.000   0.000       13                     MemoryRealloc
  0.000   0.000        2                       MemoryFree
  0.000   0.000        4                       Mutex_lockMutex
  0.000   0.000        4                       Mutex_unlockMutex
  0.000   0.000       94                     Mutex_lockMutex
  0.000   0.000       94                     Mutex_unlockMutex
  0.000   0.000      189                   ContainerFindObjects
  0.000   0.000      189                     Mutex_lockMutex
  0.000   0.000      189                     Mutex_unlockMutex
  0.000   0.000      310                   MemoryFree
  0.000   0.000      430                   Mutex_lockMutex
  0.000   0.000      430                   Mutex_unlockMutex
  0.047   0.046    12318             UpdateActionThread
  0.001   0.001      303               convertUTF8toUTF16
  0.000   0.000    12120                 oneUTF8toUTF32
  0.000   0.000      101               ContainerCastRay
  0.000   0.000      101               Mutex_lockMutex
  0.000   0.000      101               Mutex_unlockMutex
  0.014   0.014    27169             SceneObj_setRenderTransform
  0.013   0.010    15226             SceneObjectSetTransform
  0.004   0.004    15226               SceneObj_setRenderTransform
  0.002   0.002     1081           ContainerCastRay
  0.000   0.000      294           MemoryFree
  0.000   0.000      630           Mutex_lockMutex
  0.000   0.000      630           Mutex_unlockMutex
  0.200   0.006    12319         ServerProcess
  0.194   0.025    12319           AdvanceServerTime
  0.168   0.024     6001             AdvanceObjects
  0.144   0.013     5997               Player_ProcessTick
  0.124   0.075     5997                 Player_PhysicsSection
  0.048   0.022     5997                   Player_UpdatePos
  0.022   0.009     5997                     SceneObjectSetTransform
  0.006   0.006     5997                       CheckBins
  0.000   0.000       20                         InsertBins
  0.000   0.000       20                         RemoveFromBins
  0.004   0.002     5997                       SG_ZoneInsert
  0.003   0.003     5997                         SG_Rezone
  0.002   0.002     5997                       SceneObj_setRenderTransform
  0.001   0.001     5997                       SG_ZoneRemove
  0.004   0.004     5997                     ContainerFindObjects
  0.001   0.001     6090                     Mutex_lockMutex
  0.000   0.000     6090                     Mutex_unlockMutex
  0.000   0.000       13                     MemoryRealloc
  0.000   0.000        5                       MemoryFree
  0.000   0.000       10                       Mutex_lockMutex
  0.000   0.000       10                       Mutex_unlockMutex
  0.000   0.000       40                     MemoryFree
  0.000   0.000      189                   ContainerFindObjects
  0.000   0.000      189                     Mutex_lockMutex
  0.000   0.000      189                     Mutex_unlockMutex
  0.000   0.000      310                   MemoryFree
  0.000   0.000      430                   Mutex_lockMutex
  0.000   0.000      430                   Mutex_unlockMutex
  0.007   0.007     5997                 UpdateActionThread
  0.043   0.039    12319         ClientNetProcess
  0.002   0.002     9839           SceneObj_setRenderTransform
  0.001   0.001     5825           MemoryFree
  0.000   0.000    12391           Mutex_lockMutex
  0.000   0.000    12391           Mutex_unlockMutex
  0.039   0.036    12319         ServerNetProcess
  0.001   0.001     7837           MemoryFree
  0.001   0.001    14933           Mutex_lockMutex
  0.000   0.000    14933           Mutex_unlockMutex
  0.000   0.000     1271           SG_FindZone
  0.020   0.017    12319         SimAdvanceTime
  0.002   0.002    19231           Mutex_lockMutex
  0.001   0.001    19231           Mutex_unlockMutex
  0.000   0.000     2688           MemoryFree
  1.526   1.520    12319     PlatformProcessMain
  0.006   0.005      275       ProcessInputEvent
  0.000   0.000     1497         MemoryFree
  0.000   0.000     3313         Mutex_lockMutex
  0.000   0.000       45         convertUTF16toUTF8
  0.000   0.000      791           oneUTF16toUTF32
  0.000   0.000      791           oneUTF32toUTF8
  0.000   0.000     3313         Mutex_unlockMutex
  0.000   0.000        7         MemoryRealloc
  0.000   0.000        2           MemoryFree
  0.000   0.000        4           Mutex_lockMutex
  0.000   0.000        4           Mutex_unlockMutex
  0.000   0.000        2         convertUTF8toUTF16
  0.000   0.000       36           oneUTF8toUTF32
  0.000   0.000        5       MemoryRealloc
  0.000   0.000        1         MemoryFree
  0.000   0.000        2         Mutex_lockMutex
  0.000   0.000        2         Mutex_unlockMutex
  0.000   0.000       17       Mutex_lockMutex
  0.000   0.000       17       Mutex_unlockMutex
  0.000   0.000        3       MemoryFree
  0.103   0.103    12319     NetProcessMain
  0.009   0.009    12319     TelconsoleProcessMain
  0.004   0.004    12319     TelDebuggerProcessMain
  0.004   0.004    12319     JournalMain
  0.000   0.000        1   MemoryFree
  0.000   0.000        1   Mutex_lockMutex
  0.000   0.000        1   Mutex_unlockMutex
#14
09/17/2006 (12:48 am)
The profiler is reporting that something in scenegraph traversal is taking 50% of your time - waiting on the GPU is only taking a relatively small 16% of the time.

So the first priority should be to find out where that time is going. :)

As you've probably noticed the profiler cues off of PROFILER_START(foo) and PROFILER_END() blocks. You can add blocks as needed to get more info on what's going on. I'd recommend adding them around every renderObject() call in the engine - that should help you nail down where the time's going. Or you can use your intuition. :)

Remember that every PROFILER_START that's encountered has to be matched by a PROFILER_END no matter what execution path is taken. So watch out for things like:

PROFILER_START(Foo);
if(something)
   return;
PROFILER_END();

This will break the profiler if the function takes that return branch. The correct version is:

PROFILER_START(Foo);
if(something)
{
   PROFILE_END();
   return;
}
PROFILER_END();

That way no matter what the PROFILER_START macro will be matched with exactly one PROFILER_END call.

BTW at 32 bit color a 512px image is only one megabyte - you got your bits and bytes mismatched I think. :)
#15
09/17/2006 (12:49 am)
You might also want to consider using:

[urlhttp://www.garagegames.com/index.php?sec=mg&mod=resource&page=view&qid=10842[/url]

The scoped profiler resource, means you don't have to worry about that matching stuff I mentioned.
#16
09/17/2006 (1:13 am)
Thanks Ben, will try to nail this down :)

btw yeah.. err... DOH! ((512x512) / 8) * 32 does indeed = 1Mb :) :blush:

Hewster
#17
09/17/2006 (4:21 am)
Quote:
@J.C does it matter what format the original textures are? I thought whatever format you
use would be converted internally to either a DX compressed texture, or "raw".

That's correct. Dunno what J.C was talking about. He probably assumed jpg's, bmp's and png's were directly uploaded as files into video memory.

So Hewster, how's this going? I'm interested in seeing what your bottleneck was.

60.598  51.648    12319                 TraverseScene

That's one bugger :)
#18
09/17/2006 (5:08 am)
Well its definatly happening in RenderInteriorMgr::render(), but not sure how to
debug this further... think I may create several basic diff's (just a cube) each with
a unique texture, and see if I can reproduce this.

That should prove if its actually an issue with my diffs, or indeed as I first thought
an issue with texture memory.

Will keep you updated.
#19
09/17/2006 (11:56 am)
What are the values of changeCount, changeVB, and changePB at the end of RenderInteriorMgr::render?

If the time is getting eaten there... well, could be a texture memory issue but I doubt it - that's more likely to show up in SwapBuffers, although I could be wrong. (It's not a scenario I've spent a ton of time in a debugger/profiler with - so my intuition could be way off.)

You could also try putting profiler blocks around different sections of the code in the render call and seeing which section time is going to. That will help nail down where the slowdown is.
#20
09/17/2006 (4:54 pm)
ChangeCount, changeVB, and changePB all reach 23,
I ran a few hundred frames to make sure, and it seems they always
reach that value.

Been doing a little bit more profiling, and have narrowed down
the main culprit to:

GFX->drawPrimitive( passRI->primBuffIndex );

in

RenderInteriorMgr::render()

I will continue profiling to try and narrow this down some more :)

Will upload my "game" later so if someone would like to test on
their system they can.
Page «Previous 1 2