Game Development Community

Why the large performance difference between XNA and TorqueX 2D for sprites?

by Jacob Lynch · in Torque X 2D · 08/04/2009 (10:00 pm) · 6 replies

If I go into XNA 3.1 and create the simplest program (load an image, begin a spritebatch, draw a scaled image, end the spritebatch) I get about 2500 FPS on the Xbox 360 (1280x720). I then go into TorqueX 2D and create a similar program (StarterGame2D, one image on the screen (GG logo)), and I get about 300 FPS at 1280x720. So why the difference?

I tested with spritebatch some more just using XNA, and I can get about 4000 sprites drawn to the screen and still maintain around 100 FPS on the Xbox. I have experimented a bit, and I can't seem to add more than around 100 sprites or so to a Torque game without the FPS dropping below 100. Am I doing something wrong? I really like all the features Torque offers, but I don't understand why they come at such a price. Any information would be appreciated.

I have looked through the engine source for only a bit, but I couldn't find any references to spritebatches at all. It seems like Torque draws a texture to a quad on a RenderTarget after applying PostProcessing effects? I haven't had a chance to understand it all yet. Just wondering if anyone else had any comments on this.

Thanks!

#1
08/05/2009 (2:58 am)
Torque X doesn't use a spritebatch to draw, this is to enable shaders on the images, I didn't imagine it would drop so low with a hundred scene objects though.

Did you disable collisions from those objects in torque x?
#2
08/05/2009 (4:24 pm)
I would imagine the performance difference comes from all the stuff that Torque X does out of the box that XNA does not do.

As Pablo mentioned, turning collisions off will probably help. I would also try setting up a template with object pooling enabled and then spawn the sprites from the template.
#3
08/05/2009 (6:53 pm)
Hey guys, thanks for the responses. I did some more careful testing just now. Previously I had been testing adding sprites to a game I was working on with a few extra things going on in the background. This time I took a brand new project, copied and pasted the GG logo (scaled down) and looked at the FPS. This is with all the components removed and collision detection disabled. Using the display size for a backbuffer (1280x720)
Objects       FPS
50            120
100           112
200           100
300           69
I stopped there with that test. That is just displaying sprites on a black background with no components.

Next I tried the other test suggested. I made the sprite (pooled, with no components, collisions off) a template, and I cloned the template.
Objects       FPS
50            120
100           114
200           100
300           68
Side note: The first time through this second set of tests I forgot to make the sprite a template. After I made it a template, and retested there were no changes to the FPS. I messed around with the 4 scripting options a bit (pool, pool w/ components, template, persistent), but nothing made a difference to the FPS.

So, I'd be happy to try more tests if someone has any other ideas? I'd really love to see better performance out of TorqueX. I was under the impression that SpriteBatch provided a way to use a shader on everything drawn in a SpriteBatch run (all Draw calls after Begin and before End). So was SpriteBatch avoided because they wanted to run the postprocessing (shaders, etc) on the entire scene at once?

Also, I tried messing around in the engine code a bit. It seems like I can improve the framerate to be almost as good as XNA, but doing so causes the 1 object with movement component to become virtually unresponsive to the analog sticks. This was on the original tests I did last night with just one object with a movement component. It was a lot like what happens when I turn "SimulateFences" off on my PC... great framerate, but the objects don't move. It seems like there should be a way to separate this?

Is this a threading issue maybe? I'm not all that knowledgeable about Torque/XNA yet, so I'm kind of just guessing. I thought XNA ran two threads, one for updating (logic) and one for drawing (graphics). Is that way off? I also was under the impression that Torque was single-threaded from some forum posts I read, but those are dated from years ago, maybe that has improved.

Anyway, just trying to see if there's anything we can do to draw more sprites to the screen using TorqueX. Thanks!
#4
08/05/2009 (9:24 pm)
Hey Jacob,

I really don't think you can make these types of comparisons; its kind of like comparing an apple to an orange.

I would expect the Torque X framework to be slower then XNA, even if both are barebones scenarios. Anyways, there one good reason I can think of why SpriteBatch is much faster then Torque X 2D's rendering of T2DSceneObjects -- without taking into consideration the rest of the engine framework.

SpriteBatch will switch renderstates only once (or twice if you choose the SaveState option); no matter how many sprites you render. TX2D will switch render states per sprite, which can cause a signifigant performance issue.

Considering TX2D's engine framework, there is still a lot of work that is done per frame. Walking the scene graph to see what objects are visable and updating all of the scene objects, buffering possible input each frame, applying scene render states, updating collision / physics. These are all still going on, even in barebones example. I think you might get a better comparison comparing rendered textured quads to sprite batch; because that is pretty similar to what is going on here.
#5
08/06/2009 (9:58 pm)
How come TX2D switches render states per sprite (and what does it mean to switch render states)? Is that something that I could skip out on if I wanted to lose certain features it provides?

I understand that a lot of work is going into each frame on Torque, but in a barebones scenario I would argue that all of the work isn't going to amount to much: determining visible objects, updating scene objects, updating collision/physics should all take very little time in my example. I think the main cause of slowdown would be due to the first problem you mentioned... switching render states.

When I was first experimenting with XNA (no torque) I quickly learned that even if you are using a SpriteBatch, if you are drawing different textures to the screen one after another, causing the texture to be swapped a lot, it will cause severe framerate problems. Is switching the render state doing something similar?
#6
08/07/2009 (8:33 am)
Hey Jacob:

This may be incorrect, but I believe this is how it works. I have provided references below that support my concept.

It is sort of like the example you are giving. Basically when you are sending a workload to the GPU the GPU can perform better when less frequent, large amounts of work compared to more frequence smaller amounts of work. This is typically why you gain a lot of performance when you send larger draw calls less frequencly to the GPU. Basically, the GPU works a lot faster when it doesn't have to switch any data; this can be switch to different vertex buffers, index buffers, textures, and even render states. The less switching you have to do, the more performance you will gain.

Most people like to term these large amounts of work as batches, and that is exactly why the class is named Spritebatch because it batches multiple draw calls that are similar (i.e., same geometry, shader, texture) into one draw call. This GPU can then render this very quickly because it doesn't require any switching. As soon as you change one of these variables and make each sprite batch have unique data, you can no longer reap the performance benefits of using sprite batch. Which states why both your example and my example cause performance issues.

Technically if you have geometry that is static you can gain even more performance by uploading the data to the GPU memory which will only be costly the first time it is done. After that the CPU will no longer have to send the GPU the vertex data each frame and you will gain lots of performance by loosing that overhead.

This is a bit of a shameless plug :), but I've used the same approach before using OpenGL's equivalent (VBO's) in my thesis last year Feature preserving continuous level-of-detail for terrain rendering. This is one of the first problems you run into with terrain rendering, due to how much data you want to throughput each frame.

Also, here is two useful links where Shawn Hargraves talks about these same issues (and probably explains them a lot better then I did here!):

SpriteBatch and SpriteSortMode - Talks about batching in general and how SpriteBatch uses this functionality to optimize its performance.

SpriteBatch and renderstates - Third sentence mentions switching render states causes performance loss.