Game Development Community

Render Performance.

by Matthew Hoesterey · in Torque X 2D · 06/21/2010 (10:56 pm) · 19 replies

Heya Guys. I was wondering if anyone had any tips on improving my rendering performance. I'm starting to run into performance issues. (I run great at 2 players but at 4 it "feals" a bit choppy. I want to get some MS back :) )


Running my game through a profiler I found that 72.46% of my processor time is being used in DrawFrame. 16.46% of that is being used in the FindObject method (this seamed a bit high). I cut my boards up into small pieces do you guys think this could be causing the problem?

Also I noticed the
public virtual Vector2 Position
        {
            get { return _position; }
            set
            {
                _position = value;
                _postTick.pos = _position;
                IsSpatialDirty = true;
                UpdateSpatialData();
            }
        }

class in the T2DSceneObject.cs class is automatically setting the IsSpatialDirty = true;
As a result I found that the UpdateSpatialData method is sucking out alot of processor power.

It seams UpdateSpatialData could be optimized to only update the rotation or position or whatever changed. What do you guys think?

I'm seeing most the processor time is taken in updating the rotation (and I never really need to rotate most of the objects).


Thanks guys.

#1
06/22/2010 (4:57 am)
"It feels a bit choppy"

What frame rate are you getting?


"I found that the UpdateSpatialData method is sucking out alot of processor power."

How much? It certainly sounds sensible to only update what needs updating though, if it can be done without breaking anything else.
#2
06/22/2010 (7:37 am)
Heya Duncan. I don't have a framerate counter currently, I need to figure out how to add one. :).

I can't remember exactly off the top of my head but UpdateSpatialData uses about 2% of my processor power if I remember correctly. Thing is at times it gets called 3x as much as it should. If I translate, rotate, and scale an object it will get called 3 times instead of once. :(. I figure I could just add a "Translate" method that allowed you to do all 3 and then UpdatedSpacialData would only be called once. Idealy I could also strip out the rotation update when it is not needed but that may break a bunch of stuff, something I can try.

I run into two issues with 4 characters on screen. 1 it just seams to loose some frames intermittently. Also with 4 characters on screen I think I'm getting a GC a few seconds into the level that hangs things up for a 1/2 sec. Not sure whats causing that exactly. I don't think its the added textures as I always load all 4 character textures even if they are not on screen.

I can do TONS of optimization on my end of code but figure I can get a bigger bang for my buck out of making lower level tweaks. Update is only using about 20% of the processor power and 5% of that is used to UpdateAnimations and another 5% to update ParticleEmitters.

To save texture space I also cut my boards up into tiny little pieces and then reassembled them. I think this may be causing a problem as my Umhills board that is cut up smallest is using the most time in FindObject on render. I might be able to trade some memory for proccessor power their though my game is using 370-390 megs of ram on windows so I don't have much to give up.

The bright side is that I only have some sounds to add on one of my characters and all my content is in. That and some additional ai will be the only thing that sucks away any more proccessor power. Lucky the expensive stuff: (path-finding and projectile detection) is already in.

Thanks for any help. :)

#3
06/22/2010 (8:43 am)
2% doesn't really seem like it should be a priority initially - even if you reduce it by 50% you only save 1% overall. Whereas if you can reduce the findobjects by just 12.5% you already save 2% overall.

On the other hand, it does depend on how feasible it is to do that - I'd imagine reducing the object/tile count would help though, as it would mean less objects for findobjects to find.

Also sounds like their might be GC issues so try to see if you can find out where those are happening.
#4
06/22/2010 (10:04 am)
Thanks for the info Duncan. I Optimize what I can and will see what else I can find with dotTrace. Apparently it can profile GC so hopefully I can find whats causing the hang.
#5
06/22/2010 (6:51 pm)
Heya, So I added a frame counter. My game runs at 33 fps when its doing good. and as low as 20 when its not.

I found the GC issue and fixed it. I changed when my characters spawn and my the unused anim data I swaped out with my dye map material where getting GCd after the level started. I just forced a garbage collect right after load and everything looks good.


My update calls are now optimized to only use about 10% of the processor power and drawframe is using the rest. (I can optimize update alot further prolly getting it down to 7-8%).

The framerate goes to usually hell when my camera is zoomed out and more objects are on screen.


Here is the strange thing. I thought it was my environments but in actuality the characters are for some reason taking forever to draw. Each character adds around 5% to the total render time. :(.

So for example at 2 characters my DrawFrame method takes 70% of my processor power, but at 4 it takes 80%. This leads me to belive my 4 character sprites are taking a total of 20% of my processor power to draw :(.

The weird part though is the _FindInBin method of the render is increasing by 2.24% just for adding these two characters.(and another .5% in pre-render) My characters are made up of around 50 anims each and I wonder if the engine is searching through them all every frame to see what it need to render.


I'm going to keep optimizing though I'm beginning to think torque does not like my working with lots of animation data :(.
#6
06/23/2010 (4:59 pm)
I found that on windows my level is taking

27.08 ms per DrawFrame call with the environment and 4 characters

22.49 ms per DrawFrame call with 2 characters and an environment.

23.35 ms per DrawFrame call with just 4 characters and no environment (totally deleted everything enviornmental that renders)

18.92 ms per DrawFrame call with 2 characters and no enviornment

So on windows rendering my 2 character sprites takes longer then my environment with 70+ objects. :(


I also checked my GC on the xbox. Once the level starts my GC is running consistently at once per sec and taking 60ms which seams ok.


it seams that for some reason its taking longer for torque to draw my characters then an entire environment. I'm going to investigate further but I'm thinking it may not handle lots of animations well. I'm not sure what it could be on my side logic wise as I don't know how I would be effecting the draw call.

I'll dig in further.
#7
06/24/2010 (6:27 am)
"on windows rendering my 2 character sprites takes longer then my environment with 70+ objects. :( "

How do you know that for sure? You didn't give any figures for rendering the environment with no characters. At the very least there will be general DrawFrame overheads to take into account, which will occur regardless of how much you are rendering. You might also try testing it with nothing at all being drawn.

I'm not saying your characters aren't taking longer to draw, just that you should confirm it first :)
#8
06/24/2010 (7:44 am)
I figured that because drawing 4 characters with no environment was faster then drawing 2 characters with an environment it made sense that 2 characters took longer then the environment but I guess that isn't 100% scientific so I'll do a test with just the environment.


I actually just pulled in my engineer friend on this one as to be honest this stuff is a bit beyond my programing skill level.

He's going to help me narrow down the performance issue. Its very possible I'm doing something stupid but I'll share my findings with you guys in anycase.

Also we may write some code to bypass a part of the renderer that finds objects in the camera view and excludes others as that seams to be a major performance hit at first glance. (with my game being a fighting game all objects in my level are usually always on screen unlike a normal platformer).

If we end up doing any big changes like that I'll offer the code to any Source owners who want that type of renderer.

As I said though I could just be doing something stupid. All my code experience other then this project is in making tools (or scripting for particle effects lawl) so this is my first time doing real time performance optimization. My buddy is teaching me a bunch of tricks though. :)

#9
06/24/2010 (9:03 am)
Yep, optimizing the searching for stuff to render might be a good idea if you have a lot of objects, but they are always onscreen. Just rendering everything without doing a search might net you a few extra fps. As you say it doesn't apply to all types of game, but for some games it makes more sense.
#10
06/24/2010 (6:52 pm)
Found something you guys may like to know. I havn't figured out why this is happening yet.

The _RenderGroup method in the BaseRenderManager.cs class gets called every frame for every object that will render. So obviously this should be well below 1 ms (and usually is).

However the first time a animation plays or an object appears, fx, ect.. this method jumps up as high as 56ms for the new object! I think this jump only happens when a new material or animation is played, cloned from a template, or shown for the first time. For example the first time the player plays the fire ball anim I will get a jump. I will also get a jump the first time the fireball shows up. After that I dont have an ms jump for those anims or objects.

So with 4 characters playing "new" anims and fx for the first time this method alone could be causing some issues. I found my game performance gets better over time I have a feeling this is why.


#11
06/25/2010 (4:54 am)
Materials (including animations) are normally given a 'setup' pass (see preloadmaterials() in MaterialManager) when a scene using them is loaded. This helps to "reduce startup hitching".

I know that you are loading your materials asynchronously, separately from the levels, and I am guessing your levels do not reference all these materials. In which case they are not all getting the 'preload' treatment. If this is the case then try adding a preloadmaterials setup pass immediately after you have loaded all the materials (or if you prefer you could make the scene file/s reference them so they get processed automatically).
#12
06/25/2010 (6:50 am)
Thanks Duncan,

My levels should be referencing all my materials as all my objects are created through cloning templates created in TXbuilder. Perhaps dumping them into a shared scene graph is messing something up?

In anycase I'll try forcing a preload materials setup after I dump the objects into a shared scenegraph.

Thanks again!
#13
06/25/2010 (4:07 pm)
Question for ya. Does the IndexBuffer have anything to do with preloading?

So I went through the code and dumped the preloaded materials to a list. They look to be pre-loading :(. I've also verified that _LoadEffect(); does not get called once the level is loaded.

I've tracked the performance issue down to a single line in _RenderGroup though:

if (ri.IndexBuffer == null)
{
   d3d.DrawPrimitives(ri.PrimitiveType, ri.BaseVertex, ri.PrimitiveCount);
}

Sometimes the above line takes up to 3-25 ms to render an object. This only happens the first time the object is shown on screen or the first time an anim plays but is obviously decimating my performance. :)

Thanks for the info. :)

#14
06/26/2010 (5:00 am)
Perhaps there is a hit for the initial upload of the data to the GPU? I'm not an expert on it though I'm afraid. You might try in the XNA forums.
#15
06/26/2010 (5:52 am)
Thanks Duncan. I'll check over their. I'm going to find out if I can optimize this.

I did manage to get my framerate back up through other means and found a possible inefficiency in the renderer that cost 2 ms per frame. I'm sending to pino.

My game runs again at 33 fps on xbox though it does drop to 27ish occasionally at the beginning of a level. I'm going to continue optimizing and share anything good that I find.

Thanks again for the help.
#16
06/28/2010 (7:59 pm)
I've managed to squeeze enough performance out of optimizations to be up and running again. I have yet to completely remove the find object method in the renderer or to fix the hitch with draw primitive, but I run at 30 fps + on the xbox.

I'll do another optimization pass when I get closer to ship. Thanks for the help. :)
#17
10/28/2010 (6:15 pm)
Hey Matthew,

Did you end up playing around with bypassing the part of the render that looks for objects off screen and just loading everything?

Did it actually make an impact?
#18
10/28/2010 (6:29 pm)
I didn't. I did eliminate the code that searched for particles that should render even though they are off screen and that helped (but causes a bug where particles wouldn't always draw.).

I'd suggest waiting for the 4.0 version of the engine. Allot is being done by the community to optimize everything. It should be much faster I hope. ;)
#19
10/29/2010 (2:59 am)
I'm part of the 4.0 beta team. It it's coming along nicely. I'm RevTyler by the way :)

I'm definately looking forward to getting the builder up and running, I'm jumping around my project looking for things to do that don't involve TXB :)

As far as the pre-render goes, I'm finding that on the xbox the layer sorting is taking a ton of time. The next chunk right after the particle searching. It goes through every object every render call to order them. Seems like a waste.