Game Development Community

polygons per second

by Kory Imaginism · in Torque 3D Professional · 04/26/2009 (11:12 pm) · 38 replies

An artist on my team asked me how many polygons per second does T3D renders, and didn't have the answer can somebody "shed some light" on this question?

thanks
Page«First 1 2 Next»
#21
04/27/2009 (11:02 am)
Quote:
So if I perform an affine transform on a mesh, which by definition is a connected graph, it must touch every vertex in the mesh. We aren't talking about culling, clipping, or otherwise optimizing geometry. A bazillion vertices means render, or at least process for depth testing, a bazillion verticies.
Vertex valance can 0+, 0 for an unconnected vertex (happens quite a lot for attachment points, errors etc.). In the 0 case, the hardware won't transform it, in the >1 case it will depend on the topology of the index data and the post-vertex-transform-cache how many times its transformed.

And btw the transform is never affine (except in the degenerate cases obviously) its a projective transform and homogeneous divide. Not that it matters until you get to clipping and the real fun stuff like interpolation :D
#22
04/27/2009 (12:22 pm)
The affine transformation I was referring to was the transform between model and world space prior to the projective transform.

Instanced geometry->Affine Transformation->Geometry in world coords

If the geometry is animated this needs to happen every frame doesn't it?



I am also not familiar with the term connected vertex. Does this refer to the line segment forming an edge between two vertices?
#23
04/27/2009 (12:55 pm)
It'll always happen every frame, the caching is when reusing the same shaded vertex in a single draw call...
#24
04/27/2009 (1:11 pm)
@michael

Another thing I hadn't thought of.

Could you use this in a situation where there are several point lights in the scene? It would seem in that case the results of any shading would be different.
#25
04/27/2009 (1:29 pm)
Lighting is generally done in the pixel shader (with a loop or multiple passes), not the vertex shader. If it's multiple passes, it's multiple draw calls; if it's a loop in the pixel shader, it's not related to the vertex shader anyway...
#26
04/27/2009 (1:32 pm)
@Joshua
Data is almost never stored, its pure throughput through the pipeline, as such wether it changes or not its resent every frame (this obviously excludes stream out, r2vb and pixel/vertex aliasing system).
Anything like post-vertex-transform caches tend to be tiny are are just to help reduce a few pipeline bubbles, for example a post-vertex-transform cache of 24 or 32 vertex entries is pretty large.

Lighting depends on many thing, vertex or pixel? multi-pass, single-pass, forward or deferred and half a dozen things...

Also the transform is never Affine, a common (tho old school) non affine instance to world transform is a projection onto a plane used for fake shadow onto a planar surface. All just a non affine world transform and viola basic fake shadow :)
#27
04/27/2009 (1:32 pm)
I wasn't restricting this topic to vertex shaders. I have been talking about performance over the entire pipeline, what causes bottlenecks, and whether or not geometric complexity could cause them.
#28
04/27/2009 (1:37 pm)
Affine transformations:

1. Rotation
2. Scaling
3. Translation

Quote:It'll always happen every frame, the caching is when reusing the same shaded vertex in a single draw call...

This is what I was responding to with this:

Quote:Could you use this in a situation where there are several point lights in the scene? It would seem in that case the results of any shading would be different.

My point being, if you had several point lights in a scene, caching a shaded vertex does nothing for you because unless the lights or geometry are very close together (read on top of each other) the shading results will be noticeably different.

Edit - By the way I specify point lights because although vectors have no position information, the vector to the light source from the vertex will change if the position changes and IF it is a point light source.

And yes.. I just quoted myself.
#29
04/27/2009 (1:46 pm)
I kind of assumed everybody writing here understood the graphics pipeline to a certain level, so I didn't define certain terms.

Sorry about that.

I'm not sure forum posts going over the complete graphics pipeline and the types and lifetimes of various cache, pipeline bubbles, shader units and fixed function hardware is really going to help explain anything clearer. Maybe something to write as a T3D doc...
#30
04/27/2009 (1:53 pm)
Quote: I kind of assumed everybody writing here understood the graphics pipeline to a certain level, so I didn't define certain terms.

I assume you are referring to the term connected vertex?

I'd appreciate the info if you could define it. Again, my assumption is you are talking about vertices sharing edges as a connection. If that is the case, if you could futher elaborate as to how that impacts the projective transform I'm wondering that as well.

#31
04/27/2009 (2:05 pm)
I was more the cache, I totally assumed everybody reading we know how they work on GPU (which is there purely there to help bubbles down the pipeline), the don't last even generaly beyond triangle batches let alone across frame.

Hardware is index driven, so it doesn't have an external 'edge'. A triangle primitive is 3 indices (ignore index compression techniques) which of course can point to 1 to 3 distinct vertices, the book keeping which of course means everything get output correctly is known as a connected vertex. The distinction doesn't seem important until you move to geometry amplification which can break those connections, which can mean the same vertex isn't the same in different primitive...
#32
04/27/2009 (5:20 pm)
Can anyone tell me if it might be possible to make a crysis like game with T3d? I would have to say yes. The videos that have been shown for a while now really really remind me of the Unreal 3 engine. I would be willing to bet, by the time we have a solid stable release of this new engine, that would could do just about everything it can. Sure, you can;t really compete with a, what, $1 million game engine ? but still, having a slice of that for $505 was well worth it to me.

Still, if I had a million I wouldn't buy the Unreal Engine, I feel its over hyped as it is, AND have to show you have had a shipped product too. If GG said we had to ship a product with TGEA before we could buy T3D, I know I would be out. And lastly, why do they care if you spend a mill and not get a completed game done? say it folds? why would Epic care? They made their money on the deal...

just my cheap 2 cents worth... I'll stop rambling now... :-P

Will
#33
04/27/2009 (6:21 pm)
Keep in mind that I have not seen the engine code but if it is as described by GG you can pretty much make any game you want with the engine, with detail level restricted only by your hardware.

You need to add hooks for all of the stuff you won't get standard. Out of the box the engine looks like it can make a lot of things easy to do. Things you want to spruce up will have to be added by you or a 3rd party.

#34
04/28/2009 (2:05 pm)
[rant mode on] :)

Actually based on a lot of different test I've done with perfHUD, most times bottleneck is fillrate or memory bandwidth (CPU->GPU flush). I presume that is because of the tendencies of modern techniques to move to image space most often. Actually we can assume with a large degree of certainty that DX10 capable hardware and high end DX9 gpus have quite mature vertex pipelines and statistically speaking, amount of vertices (in a typical current gen scenario) will rarely be a bottleneck. Actually in a couple of different environments I have observed that state changes are more of a burden than raw poly count pushed through vertex units. When you look at that it kinda' makes sense, since modern cards have unified shader architecture, a vertex will have the same branching from transform to fragment, therefore internally it will not take that much of a performance hit. State changes are still a moot area, since they still fluctuate wildly with differing bus widths and memory bandwidth (it basically still boils down to this).

[rant mode off]
#35
04/28/2009 (5:09 pm)
I don't disagree with anything you just said. Moving information from main memory to the graphics card (client to server in Opengl) can cause quite a large hit. That's why the API includes buffer objects. State changes / context switches are expensive in any form of computing.

In the context of our discussion and the OP question I'd say the above post is good information.

I still have to wonder, in my scenarios above taking geometric complexity to extremes, if given all of the above (in what I will call the rant mode post) the results of more complexity == longer render times holds in the general case?

I guess I could boil my point down to this: More geometry leads to more of everything else. (edit - I suppose this would not be the case post-projective transform)

It would seem counter-intuitive to me for a character model with a significantly greater number of vertices to take the same amount of time as the lesser. From my understanding of game design and graphics engine requirements, vertex count is one of the metrics. Is this out-dated?

Quote:I presume that is because of the tendencies of modern techniques to move to image space most often.

I'd love to learn more about this. I am not familiar with image space techniques. Most modern techniques are done after the projective transform? I was under the impression that shader stages were placed between world space and image space. I'm being sincere btw in case you think that's sarcasm.

Edit - I just looked it up. I guess image space techniques refers to rendering to texture and using those results to create a new effect. I guess given that I could see how a large portion of time could be spent manipulating render targets, which will be after the projective transform. Thanks.
#36
04/29/2009 (5:04 am)
@Joshua

Consider this example (note: mostly making up numbers, it'd depend on your system):

Say you draw a simple indexed quad. It'll be 4 vertices, 6 indices, 2 triangles, and, let's say, 400k pixels. Now let's say your vertex shader simply transforms the vertices and the pixel shader simply outputs red.

In such a case you'd probably get something like 2000 fps or whatever.

Now imagine you tesselate that quad, increasing the number of vertices, but still the same size on screen and the same dumb red color.

For a while the FPS won't drop too fast (if at all), then you'll get the hit from having 32-bit indices instead of 16-bit ones (because you have more than 65K of them) and eventually the system will crawl because of the huge number of vertices to transform.

So geometry will be the bottleneck.

Now imagine your keep your simple quad with 4 vertices and such, but apply a texture to it (without even changing the vertex data, you just compute the texcoords from the pixel shader). You increase the texture resolution, or you add more and more textures, make some math in the pixel shader. The framerate will obviously drop until you get 0.001fps or you run out of memory.

So the pixels/fillrate will be the bottleneck.

Now if you just reduce the size of your quad (by 50%), the framerate will go up again, possibly more than doubling, simply because you draw less pixels, but it'll still be the same kind of bottleneck due to all those textures and such in your pixel shader.

At this point if you were to increase the number of vertices like before, at first it probably wouldn't change the framerate at all, because the pixels are the bottleneck. Maybe you can push 1k vertices, maybe 10k, and it won't really change anything. But eventually you'll reach a point where it's balanced.

Anyway not sure what I said made sense, but it's just to show that there is no absolute truth as to where the limitations are:
- it can be the amount of geometry
- it can be the amount of pixels
- it can be the amount of state changes
etc, etc, etc

So the main goal is to have something balanced, that fits how the hardware deals with your data. For example these days hardware focuses much more on the pixel shading and such because that's where the heavy computations are occurring. On a slightly unrelated note, for few years companies have been trying to add hardware tessellation to GPUs, which would add complexity at the vertex level and not the pixel level, but those usually failed (it's back in D3D11 with domain and hull shaders).

As for image space techniques, a typical way is to use multiple render targets and have your main pixel shader output color, normal and position information to 3 separate textures. This way in a final pass you can render a large quad that's the size of the screen and apply various techniques based on the position and normals. Common examples are edge detections using the normals, which can be used to have outlines with celshading, or use the depth from the position information to apply depth of field effects.

Obviously using 3 textures (with large data for each pixel) is much more fillrate intensive than a typical rendering, since you'd get (can be compressed but still) 16 bytes (position) + 12 bytes (normal) on top of the normal data.

And as a final little bit of info, if you check product pages on nVIDIA/ATI's sites, you'll see that the fill rate/memory bandwidth specifications are skyrocketting, simply because that's where the bottleneck generally is these days: improving pixel shading (bump/tangent/parallax/etc mapping or anything else) generally gives a better end result than increasing geometry...

edit: hm that was longer than I expected, guess it'll pass time until the T3D beta release heheh
#37
04/29/2009 (8:08 am)
@michael

More good info, thanks for that.
#38
04/29/2009 (4:51 pm)
@Michael
Great elaboration... you've taken my "stingy" explanation to the next level. As you succinctly put it in the above post, shading units on todays GPUs are quite powerful but bandwidth is still the problematic area i.e. shader cores can process more data than it can be transmitted in the general scenario.
Page«First 1 2 Next»