Optimization tips for large number of sprites ?
by Bob Dobbs · in Torque X 2D · 12/02/2010 (2:25 pm) · 41 replies
Hey community, care to lend a helping hand ? I'm about losing my tether trying to optimize my project.
I thought I had fixed this issue along time ago but it seems to be rearing its ugly head.I'm almost at giving up and binning it stage T_T
Main problem seems to be with me having too many sprites on screen. Basically I'm attempting to spawn 100+ enemies. I've searched the forums and tried the "FarDistance" trick as well as UseLayerSorting to false as well as tinkering with the core as other posts have suggested.
I'm just wondering can Torque or XNA for that matter have more than 100-300 moving animated sprites in a scene at one time ? (Maybe I'm asking or expecting too much)
Here's the lowdown:
Each of my enemies are made up of 3 sprite layers; the head, the torso and legs.
Each of these parts has 30 Frames of animation as a 326x334 animated sprite and there are 5 variations on each of those animations; walk, lunge, attack and death.
I attempt to spawn 100 enemies so quick calculation time; 3 * 100 = 300 sprites they appear on screen with the spawner at intervals of one second in a large spawn area, then are activated with a simple AI Player chase component.
Just 100 "enemies"(made up of 3 sprites remember) and no problems on PC, on Xbox about 40 is my limit before the games FPS begins to go into lag and jerky scrolling.
Following the golden rules of optimization and bug fixing, I thought ok break it down...
Too many sprites ?
Too much animation data?
Sprites too big ?
Too much collision data ?
Frequency of AI upfates ?
So I set about suggestions on the forums with Layer sorting and FarDistance, little bit of improvement,
Next I though how about setting the sprites visibility, animationdata, velocity speed and unneeded collisions to null or false while out of camera view.ie almost "disabling" the 3 sprites that make up an enemy while they are outwith the camera view. See www.torquepowered.com/community/forums/viewthread/122951
with a bit of
Again a slight improvement but no great shakes.
I set about to dismantle my spawner so that it wouldn't generate the enemies as 3 layers and mounted sprites, just a single unanimated sprite, I got up to about 200 before lag started happening.
Heres a video
Next step is reduce the animation data down from 30fps to 10fps AND reduce the sprite size by half
I'm worried about doing the above as it really degrades the graphic quality of my game, I was setting about to make something that looked really polished so it's a bit disheartening to do so.
Ideally I would want a vast amount of enemies "present" then activate them while they are on camera, ideally if I could hit the 200 mark I would be happy (mind that's 600 for my enemies cos they are made up of three's)
Does anyone have any tips for debugging or improving performance ?
or just by looking at my sizes and amount of animation data am I expecting too much ? By looking at other posts lots of people seem to run into the 300 mark before lag hits in, is this the limit ?
I've been tackling this for a week and a bit now as previously I had generated 200+ sprites in a very small play area but now as I'm building my level to finalise the game I'm hitting this hurdle with lag and its heartbreaking.
Any help or debug suggestions no matter how small would be really appreciated, thanks in advance.
I thought I had fixed this issue along time ago but it seems to be rearing its ugly head.I'm almost at giving up and binning it stage T_T
Main problem seems to be with me having too many sprites on screen. Basically I'm attempting to spawn 100+ enemies. I've searched the forums and tried the "FarDistance" trick as well as UseLayerSorting to false as well as tinkering with the core as other posts have suggested.
I'm just wondering can Torque or XNA for that matter have more than 100-300 moving animated sprites in a scene at one time ? (Maybe I'm asking or expecting too much)
Here's the lowdown:
Each of my enemies are made up of 3 sprite layers; the head, the torso and legs.
Each of these parts has 30 Frames of animation as a 326x334 animated sprite and there are 5 variations on each of those animations; walk, lunge, attack and death.
I attempt to spawn 100 enemies so quick calculation time; 3 * 100 = 300 sprites they appear on screen with the spawner at intervals of one second in a large spawn area, then are activated with a simple AI Player chase component.
Just 100 "enemies"(made up of 3 sprites remember) and no problems on PC, on Xbox about 40 is my limit before the games FPS begins to go into lag and jerky scrolling.
Following the golden rules of optimization and bug fixing, I thought ok break it down...
Too many sprites ?
Too much animation data?
Sprites too big ?
Too much collision data ?
Frequency of AI upfates ?
So I set about suggestions on the forums with Layer sorting and FarDistance, little bit of improvement,
Next I though how about setting the sprites visibility, animationdata, velocity speed and unneeded collisions to null or false while out of camera view.ie almost "disabling" the 3 sprites that make up an enemy while they are outwith the camera view. See www.torquepowered.com/community/forums/viewthread/122951
with a bit of
sceneObject.AnimationPaused = true;
sceneObjectHead.Visible = false;
sceneObjectTorso.Visible = false;
sceneObjectLegs.Visible = false;
sceneObject.Collision.CollidesWith -= TorqueObjectDatabase.Instance.GetObjectType("objPlayer"); //only other collsiion is fellow enemies and walls
_active = false; //the AI chase
_speed = 0; //the velocityAgain a slight improvement but no great shakes.
I set about to dismantle my spawner so that it wouldn't generate the enemies as 3 layers and mounted sprites, just a single unanimated sprite, I got up to about 200 before lag started happening.
Heres a video
Next step is reduce the animation data down from 30fps to 10fps AND reduce the sprite size by half
I'm worried about doing the above as it really degrades the graphic quality of my game, I was setting about to make something that looked really polished so it's a bit disheartening to do so.
Ideally I would want a vast amount of enemies "present" then activate them while they are on camera, ideally if I could hit the 200 mark I would be happy (mind that's 600 for my enemies cos they are made up of three's)
Does anyone have any tips for debugging or improving performance ?
or just by looking at my sizes and amount of animation data am I expecting too much ? By looking at other posts lots of people seem to run into the 300 mark before lag hits in, is this the limit ?
I've been tackling this for a week and a bit now as previously I had generated 200+ sprites in a very small play area but now as I'm building my level to finalise the game I'm hitting this hurdle with lag and its heartbreaking.
Any help or debug suggestions no matter how small would be really appreciated, thanks in advance.
#22
01/25/2011 (5:52 pm)
One idea im going to experiment with when I get home: store a texture2d and change the render method to use a sprite batch. The stored texture would be placed outside of the staticsprite so many sprites could use that texture. For the test, ill just create a spritebatch in the game class and utilize it in the render method for the staticsprite
#23
I then extended the T2DStaticSprite and overrided the Render call to look like this:
With this code, I got up to about 300 sprites with about 35-40 fps, as I said.
However, if I changed the TESTSPRITE class to inherit the DrawableComponent class, and added some more work, i could easily get above 30,000 sprites. So the problem is the process that draws it to the screen. Not exactly sure where the bottlenecking is occuring, but I'll look into it more tomorrow.
01/25/2011 (9:50 pm)
Tested it out. With UseLayerSorting set to false, I could get to 300-350 sprites with a good framerate. Still not good enough in my opinion. There must be a bottle neck in the rendering somewhere that is causing this. I overrided the Draw call in the game class to look like this:protected override void Draw(Microsoft.Xna.Framework.GameTime gameTime)
{
spriteBatch.Begin();
base.Draw(gameTime);
spriteBatch.End();
}I then extended the T2DStaticSprite and overrided the Render call to look like this:
public override void Render(SceneRenderState srs)
{
Game.Instance.spriteBatch.Draw(Game.Instance.texture, Position, null, Color.White,
0, Size / 2, Size, SpriteEffects.None, 0.0f); }With this code, I got up to about 300 sprites with about 35-40 fps, as I said.
However, if I changed the TESTSPRITE class to inherit the DrawableComponent class, and added some more work, i could easily get above 30,000 sprites. So the problem is the process that draws it to the screen. Not exactly sure where the bottlenecking is occuring, but I'll look into it more tomorrow.
#24
I guess that there is no way around here.
On the Xbox the engine runs at a steady 62.5fps till 130 sprites, and drops to 40fps with 200 sprites (which is more than enough... Halo is locked at 30fps) when the layer sorting is off, . Enabling the layer sorting the count drops quickly: it starts at 62.5fps and keeps that for 60 sprites; at 110 sprites is at 30fps already to drop to 15 reaching 200 sprites.
The problem lies in the Sort method of the embedded framework's List<T> class: I've commented out all TX code leaving just the line
_containerQueryResults.Sort(T2DLayerSortDictionary.LayerSort);
and that's the one slowing down the whole thing. My guess is that the Engine itself more generally suffers from this limitation of the embedded framework as it makes large use of the List<T> class.
There are workaround to avoid using List<T> but this entails a huge refactoring of the whole engine... let me tinker with this for a while...
01/26/2011 (7:05 am)
Hey guys,I guess that there is no way around here.
On the Xbox the engine runs at a steady 62.5fps till 130 sprites, and drops to 40fps with 200 sprites (which is more than enough... Halo is locked at 30fps) when the layer sorting is off, . Enabling the layer sorting the count drops quickly: it starts at 62.5fps and keeps that for 60 sprites; at 110 sprites is at 30fps already to drop to 15 reaching 200 sprites.
The problem lies in the Sort method of the embedded framework's List<T> class: I've commented out all TX code leaving just the line
_containerQueryResults.Sort(T2DLayerSortDictionary.LayerSort);
and that's the one slowing down the whole thing. My guess is that the Engine itself more generally suffers from this limitation of the embedded framework as it makes large use of the List<T> class.
There are workaround to avoid using List<T> but this entails a huge refactoring of the whole engine... let me tinker with this for a while...
#25
01/26/2011 (9:01 am)
Im gonna gut the render code tonight and only put in the minimum to render. For layers, im just going to use a for loop and say: if layer= x, render, where x is the incremented layer counter
#26
01/26/2011 (11:22 am)
@Pino So what are the arguments against having a second List, or table for rendering. You could override the set Layer property to add or remove it from the proper list. From my understanding, torqueX is based on this strategy of using the right list, or bin, or cookie or whatever, for the job, this kind of list to me would sound optimal. In other words, Layer1 would have a list, Layer2 would have a list, and these would all probably be hashed I presume in a larger table.
#27
01/26/2011 (12:07 pm)
@Will: I'll be brief as I'm on the WP7. The rendering list is built each frame against the camera view, so there isn't a maintained one. The bins return a list, so we can't just change the main rendering loop, it's a more extensive fix that must be studied :( I've measured the List<T> performance on the embedded framework and it's awfully slow also when adding objects. We need to rewrite quite a lot of code to solve this one :(
#28
01/26/2011 (2:23 pm)
Could we replace the lists with arrays of a set size? I know this is less extendable, but wouldn't it be faster?
#29
Bottom line... I hate the embedded framework! BTW, the 4.0 is way more sensitive than 3.1 about GC cycles: if the developer isn't careful enough cleaning up his logic, a game that worked fine in 3.1 will crash running on 4.0 :(
01/26/2011 (4:25 pm)
@Michael: I'm exploring the array option (well... I just got home and it's past midnight so... it goes for tomorrow) but I'm not really sure that the amount of work entailed is worth the result. First off I'm going to measure the actual gain (if any) using a test app (pure XNA to avoid false readings), then I'll ask the others colleagues involved in the 4.0 porting because I'm short in time so I need more eyes on this before starting such a big task (I can easily miss some detail because I've too many things on hand).Bottom line... I hate the embedded framework! BTW, the 4.0 is way more sensitive than 3.1 about GC cycles: if the developer isn't careful enough cleaning up his logic, a game that worked fine in 3.1 will crash running on 4.0 :(
#30
01/26/2011 (7:31 pm)
Ok, so I've messed around a bit and found where a lot of the slowdown is occuring. The culprit: FindObjects() in T2DSceneGraph -> very slow. I axed that out of my rendering process and my framerate got boosted - I was able to render 1,000 sprites above 30fps. Of course, I gutted most of the other stuff out too. Going back to check to see if I can add more of the other stuff back without hurting performance too much.
#31
Any way to do this?
01/26/2011 (11:57 pm)
Also, for those people who NEED sorting on - like me - is there any way we could sort only when a new object is added to the database. I'm trying to mess around with that, but I'm running into a bunch of problems where certain objects aren't showing up properly. It would seem like sorting only when objects are registered would significantly improve performance so that way when objects are ready for rendering, they are already sorted in the correct order so they layer properly, which won't require a sort to be called every time render is called. Any way to do this?
#32
01/27/2011 (4:28 am)
@Michael: I wish it were that simple ;) the bins are in charge of referring the whole system. Those are decently quick for their job and remove/alter that system means to rewrite half the engine. About rendering, it is correct to render only objects intersecting the camera view, thus creating a rendering list of some sort is correct as well: you can't really render all registered objects, that's out of discussion. I'm looking into reversing the logic, not making the bins responsible for the rendering inclusion and avoiding a List<T> as well, but there are many problems to face to reorganize that logic.
#33
01/27/2011 (9:48 am)
When we register objects to the torqueobjectdatabase, I was thinking of storing objects that can render in a separate array. When I draw, I then refer to this array. I can also sort them when I want to. In my game, my camera does not move, so I don't ever need to worry about objects rendering offscreen. Im gonna try this and see what happens when I go home tonight.
#34
01/27/2011 (10:28 am)
Well Michael, if you don't use the camera, leaving just fixed, then there are a lot of viable workaround, but they cannot be implemented as part of the CEV, those could just be your personal mods of the engine. We do need a more extensive and flexible solution to this issue.
#35
01/27/2011 (11:26 am)
Understood. My game is so close to completion and this is now my largest problem. I'll continue to work on those more viable options as well.
#36
@Michael - Some interesting stuff, those overrides were just for static sprites yeah ? Are you primarily working only with static sprites ? I tried a similar on the T2DAnimated Sprite component, but little or no joy, same as you I can creep up to around 340-ish and it starts to stutter. You say you managed to get up to 1000 with edits to the T2DSceneGraph. Could you elaborate a little ?
I found removing anything other than the "find closest" parts and I would get nothing rendered at all. Would really appreciate to hear how you did this, I'm willing to try anything to get up to those figures.
Like you I need some kind of LayerSorting so setting to false gives me a little boost but not so pretty to look at...At the moment on Xbox I can get up to 200 "enemies" made up of 3 layers, so that's potentially 600 sprites (They're pretty big sprites at that too) but the FPS is just dragging off at 20-20FPS...=(
feel free to drop me a mail (hoddie.campbell AT gmail.com) if you would like access to my svn and likewise if you could detail how you axed out the relevant parts from T2DSceneGraph would love to hear em !
@Jacob - Thanks for the link I had been thru your post previously in many a head scratching session; I was equally surprised as you to find no SpriteBatch references in the engine code. I'm currently trying to replicate what I've built in TorqueX into plain XNA to see if I can get a boost, but its a long painful slogful process as Torque was really handling my layers and mounting very nicely. Like you say it would be great if we could get over this limited hump. Likewise feel free to drop me a mail if you want a peek at what I've done so far.I'm between a rock and a hard place wondering if I should be using TorqueX at all, XNA the performance seems quicker but its fiddly work without the ease of TX, a pity as having a "hoarded" amount of enemies is essential to my game design.
@Pino - As ever thanks for your continued eye on this, its always warmly welcomed.As you were saying a more flexible approach needs to be used for the CEV as a whole, tho any ideas how I could have a "personal dirty hack" on re factoring the List<T> ? or is this a massive task ? I had been readign up about XNA4 multithreading and wondered perhaps using SetProcessorAffinity on "render" oriented tasks would be feasible...I know your busy with bigger things but any re factoring or ideas you had in mind from my SVN would be great to see.
Anyhoo, onwards and upwards, will get to that 200+ enemies dream someday I hope ! *sigh* now back to finishing my lunch and back to work
01/30/2011 (7:15 pm)
Hey Guys, I'm just catching up after a brief busy spell starting at new job and been having a try these suggestions given here.@Michael - Some interesting stuff, those overrides were just for static sprites yeah ? Are you primarily working only with static sprites ? I tried a similar on the T2DAnimated Sprite component, but little or no joy, same as you I can creep up to around 340-ish and it starts to stutter. You say you managed to get up to 1000 with edits to the T2DSceneGraph. Could you elaborate a little ?
I found removing anything other than the "find closest" parts and I would get nothing rendered at all. Would really appreciate to hear how you did this, I'm willing to try anything to get up to those figures.
Like you I need some kind of LayerSorting so setting to false gives me a little boost but not so pretty to look at...At the moment on Xbox I can get up to 200 "enemies" made up of 3 layers, so that's potentially 600 sprites (They're pretty big sprites at that too) but the FPS is just dragging off at 20-20FPS...=(
feel free to drop me a mail (hoddie.campbell AT gmail.com) if you would like access to my svn and likewise if you could detail how you axed out the relevant parts from T2DSceneGraph would love to hear em !
@Jacob - Thanks for the link I had been thru your post previously in many a head scratching session; I was equally surprised as you to find no SpriteBatch references in the engine code. I'm currently trying to replicate what I've built in TorqueX into plain XNA to see if I can get a boost, but its a long painful slogful process as Torque was really handling my layers and mounting very nicely. Like you say it would be great if we could get over this limited hump. Likewise feel free to drop me a mail if you want a peek at what I've done so far.I'm between a rock and a hard place wondering if I should be using TorqueX at all, XNA the performance seems quicker but its fiddly work without the ease of TX, a pity as having a "hoarded" amount of enemies is essential to my game design.
@Pino - As ever thanks for your continued eye on this, its always warmly welcomed.As you were saying a more flexible approach needs to be used for the CEV as a whole, tho any ideas how I could have a "personal dirty hack" on re factoring the List<T> ? or is this a massive task ? I had been readign up about XNA4 multithreading and wondered perhaps using SetProcessorAffinity on "render" oriented tasks would be feasible...I know your busy with bigger things but any re factoring or ideas you had in mind from my SVN would be great to see.
Anyhoo, onwards and upwards, will get to that 200+ enemies dream someday I hope ! *sigh* now back to finishing my lunch and back to work
#37
01/31/2011 (4:35 am)
TX's handling of scene objects is not inherently thread safe so threaded rendering is likely to require a substantial amount of work. Multithreading support is certainly something to look at further down the road though I think (for example, it would be great if physics ran in a separate thread). But, it's likely to be a major piece of work.
#38
As Duncan pointed out the multithreading is a major work, nothing that can be improvised and nothing that could safely work with TX current structure.
01/31/2011 (6:38 am)
@Hoddie: in your game the camera isn't fixed and the playing area is quite wider than the screen, thus there is no "quick & dirty" solutions there. I had a few very very busy days, now things are slowing down again so soon I'll be back to you with a fix ;)As Duncan pointed out the multithreading is a major work, nothing that can be improvised and nothing that could safely work with TX current structure.
#39
02/01/2011 (3:56 am)
Cheers Pino ! As ever gratefully appreciated ! I'm up to my eyes in learning Python at the mo with the new job, tho noticed the ContentBlock & AsyncSceneLoader related stuff has its own Processor affinity and looked around that and as you say seems like a big job to do, anyhoo look forward to hearing from u again soon. Cheerz!
#40
I've just committed a customized algorithm to sort way faster the list (actually not sorting it). Now 200 sprites on the screen keep 40fps on the Xbox against the 13fps registered with the original code. Till more than 110 sprites it stays at 63fps, so I guess it's a step in the right direction.
Please check it out and post some feedback ;)
02/02/2011 (4:31 pm)
Hey guys,I've just committed a customized algorithm to sort way faster the list (actually not sorting it). Now 200 sprites on the screen keep 40fps on the Xbox against the 13fps registered with the original code. Till more than 110 sprites it stays at 63fps, so I guess it's a step in the right direction.
Please check it out and post some feedback ;)
Torque Owner Jacob Lynch
www.garagegames.com/community/forums/viewthread/98813
@Hoddie: In my experience, PoolWithComponents is great for avoiding allocating memory on the Xbox. I use it on almost all my objects. It is especially useful on my weapon bullets. Once I get a decent sized pool of bullets then I see new memory allocations drop off drastically. I'm not sure how much of an effect it would have on the framerate though.
I'd really, really love it if we could increase the number of sprites we can use and still see a decent framerate. Let me know if there's anything I can do to help. I am also wondering if whatever this is affects the particle effects as well, because I notice they can drop the framerate pretty quickly.
Thanks for looking into this Pino!