Game Development Community

Performance with High Character Count

by Thomas Phillips · in Torque Game Engine Advanced · 12/31/2006 (6:12 pm) · 25 replies

For those that have tested missions with high character count (> 30), what sort of performance are you seeing with many animated characters within the frustum?

From a blank view (nothing but atlas terrain) I was getting around 150fps. I put in about 45 animated characters and it dropped to about 10fps. Is this normal? (I wasn't expecting such a dramatic reduction in framerate.)

Tom
Page «Previous 1 2
#1
12/31/2006 (6:34 pm)
Can you play any game with 45 animated players (polygons > 3000) on screen and get higher than 20FPS? ;)
#2
12/31/2006 (7:02 pm)
In order to have that many characters on the screen and get an acceptable frame rate you'll have to have some good LOD on them. If your pushing that many polygons on screen though it would be normal to get a low frame rate unless you have a quad-sli system or some such.
#3
12/31/2006 (10:52 pm)
I suppose it's a lot of polygons, but I guess I was expecting better performance. It was approximately 165,000 polygons (half of which should have been culled), a Radeon X1300 pro, 3GHz Pentium 4, and 1.5GB RAM. That's 83,000 visible polygons and two textures on what I would consider reasonable hardware, yielding 10fps. (Maybe it's not reasonable... :-O )

(I just profiled it and found the two big time hogs are TSSkinMesh::updateSkin() and the Player_PhysicsSection in Player::processTick(). My initial hunch would be inefficient code in the loop transforming vertices, but I would need to trace through it to be sure. The physics part looks pretty abstracted and would take some digging.)

Tom
#4
01/01/2007 (3:16 am)
What is taking time is obviously not the polygonal rendering, but the animation of the player and the physics. I'm not sure about the former, but the latter shouldn't be that high.

Are you running a low-end system?
#5
01/01/2007 (3:53 am)
And that's one reason why you have LoD for players (or anything highly animated)--even with batch rendering, it's a lot of processing for complex animations.

I'm curious though, why do you think :

Quote:
165,000 polygons (half of which should have been culled),

--culled by what?
#6
01/01/2007 (4:15 am)
Player physics aren't cheap. If you need lots of players on-screen I suggest simplifying the physics a bit. ;)
#7
01/01/2007 (9:33 am)
@Stephan

Direct3D should be doing backface culling; polygons wound counterclockwise relative to the camera should be culled, unless backface culling is disabled. Your question made me curious (nudge, nudge, wink, wink), so I started looking through the code for render states. And... backface culling seems to be disabled in numerous places. Perhaps that's reasonable for certain transparent objects, or objects with two-sided shaders (like vegetation). Since opaque poloygons are the most common, though, wouldn't it make more sense to default to enabled backface culling and provide a switch somewhere to disable it? (Switching it on and off per mesh would be ideal, but I don't know if that's feasible in DirectX or OpenGL.)

@Ben

The physics are just a perfectly flat atlas terrain and simple bounding boxes around the characters. I do see your point, though, that any kind of physics simulation is likely to be inneficient as the number of potentially interacting objects increases. I'll look into collision alternatives for high character count scenes.

Tom
#8
01/01/2007 (10:36 am)
The below code block from TSSkinMesh::updateSkin() looks expensive. Is there any reason these data cannot be cached?

// set up bone transforms
   S32 i;
   for (i=0; i<nodeIndex.size(); i++)
   {
      S32 node = nodeIndex[i];
      gBoneTransforms[i].mul(TSShapeInstance::ObjectInstance::smTransforms[node],initialTransforms[i]);
   }

   // multiply verts and normals by boneTransforms
   S32 prevIndex = -1;
   for (i=0; i<vertexIndex.size(); i++)
   {
      Point3F v0,n0,t0;
      S32 vIndex = vertexIndex[i];
      MatrixF & deltaTransform = gBoneTransforms[boneIndex[i]];
      deltaTransform.mulP(initialVerts[vIndex],&v0);
      deltaTransform.mulV(initialNorms[vIndex],&n0);
      deltaTransform.mulV(initialTangents[vIndex],&t0);
      v0 *= weight[i];
      n0 *= weight[i];
      t0 *= weight[i];
<snip>

Tom
#9
01/01/2007 (4:31 pm)
@Tom: It's not the collision query that's expensive so much as the per-tick logic the player does to update its state.

That data can be cached, but doing so carries a relatively high memory cost for a complex scene, and, in any case, the data has to be re-calculated every frame - so is only a win if that's being called more than once per frame per shape.
#10
01/01/2007 (4:42 pm)
What you'd ideally want to do for LOTS of characters, is write some kind of multi-character render method, so that it basically draws all characters in batches. There is a paper on doing exactly this somewhere on t'web, I think its on nvidia's site. Might have been one of thier demo thingies.

But I'd go with what ben said, have a look at the physics first, there is a lot of stuff in there that can be binned (literally).

Oh, plus, shouldnt those bone transforms be done in a vertex shader??

LOD'ing the bones would make sense too.
#11
01/01/2007 (6:26 pm)
We originally chose not to GPUize the skinning cuz a) lets us parallelize a bit of work and b) works with any number of bone influences.

I'd be interested in seeing a GPU skinning implementation in TGEA and getting the chance to do a bit of comparison!
#12
01/01/2007 (7:47 pm)
Hmmm regarding backface culling. Someone else might have a proper answer for this, but if I'm not mistaken (haven't done a merge in a while) backface culling should be occuring already unless doublesided = true in the material definition. I had a model a while back who had some polygons near the wrist which were being backface culled and you could see through his body as a result. In order to stop it from being backface culled I had to add doublesided = true; to the material block.
#13
01/02/2007 (1:26 am)
Ben, I dont think the number of bones is a factor anymore is it? I mean, using GPU skinning these days must be a win (when we up the number of bones/vert).

GPU Skinning would definitely be an interesting project.. ANYONE?
#14
01/02/2007 (8:21 am)
@Thomas - Take a look at this bug and this other one. It might help with backface culling issues your seeing. Backface culling should be controlled per-material in most cases.
#15
01/03/2007 (6:00 pm)
@Tom

Thanks -- I have now patched those in.

@Phil

Indeed, GPU skinning would be nifty. I'm certainly not a Shadermeister, but it certainly temps me. I went ahead and dug up some relevant resources:
Character Animation with Direct3D Vertex Shaders
Animation with Cg: Vertex Skinning
DirectX 8.0: Enhancing Character Animation with Matrix Palette Skinning and Vertex Shaders
Optimized CPU-based Skinning for 3D Games
Introduction to Shader Programming: Fundamentals of Vertex Shaders

Simply adding in a skinning shader wouldn't be enough, I assume. It would also have to play well in the procedural shader chain, and only when appropriate. (Autodetect and load bones?) Looks like hacking in material.cpp, shaderFeature.cpp (lots of interesting "magic glue" in there I wasn't aware of), et al. ... and that makes me wonder whether HLSL (vice assembly) would be appropriate in this context... In general, not a light task.

(For those who were unaware, I believe LeChuck contributed to shaderFeature.cpp -- one mention of "argh" and two mentions of "arrgh.")

Tom
#16
01/03/2007 (6:02 pm)
@Phil: It's not bone count, it's # of bones per vert. (You also have to do some mesh conditioning and such to get it working well, although that's definitely a surmountable obstacle.)

@Thomas: Don't use shader assembly, there's not much point.
#17
01/04/2007 (5:07 pm)
Ben: yeah, but surely torque doesnt have so many bones per vert anyway? I've never seen a really complex bone setup in any torque game.

I would be quite happy with the old 4 bones/vert setup (I think it was four that the original matrix palette skinning thingy nvidia wrote was?).

Alas.. no time.
#18
01/05/2007 (5:19 am)
The major issue in moving the skinning to vertex shader is the splitting the character geometry. You can't upload all bones into the vertex shader at once without wasting all the constant slots (you could optimize the constant usage by sending quaternions instead of matrices, but you'll lose the ability to animate translation and scaling), so you need to split the mesh into small batches grouped by bones so you only send a few bones at a time.

It's not that it's hard to do, it's that you'll need to murk around with deep pieces of badly commented code in the character animation and DTS loading code.

Anyway, if you want that many characters at once, you should start optimizing the player physics first, and only touch the skinning code as a last resort. If you add LOD to your characters, the skinning will become cheaper right away (less polygons to skin on the lower LOD levels). When you have that many characters on screen only a few will be close enough to be using the highest LOD.

Anyway, from my experience, the player collision is sorta expensive. I have an idea to make it a bit cheaper, by splitting the vertical and horizontal collisions, using a simple raycast for vertical collision, but I have yet to profile such thing.
#19
01/05/2007 (5:41 pm)
Given these assumptions:
- 3000 vertices in the mesh.
- Each vertex has a position (v), normal (n), and tangent (t), each being a Point3F.
- A Point3f is 36 bytes.
- 120 frames of animation.
- There exists one {v, n, t} per vertex, per frame. (This is based on bone influences and is precalculated and cached.)

Then:
- The mesh in any frame of animation can be be retrieved with no calculations.
- Cached values occupy 12,960,000 bytes total.

This would allow an arbitrary number of bones (only used for precalculation), but does not account for animation blending. (Could blending be done through interpolation of frames?)

Are there any flaws with this approach?

Tom
#20
01/05/2007 (7:21 pm)
DTS already supports that. The downside is HUGE memory footprint. 12mb per model? I guess if you have one model that's ok. But it seems like a super-costly way to solve the problem...
Page «Previous 1 2