Game Development Community

Improving the GlowBuffer

by Tom Spilman · in Torque Game Engine Advanced · 08/18/2005 (6:29 pm) · 8 replies

Working with my 6600 GT issues i've come across some performance issues with the current glow buffer implementation. I know that the GG's folks are not optimizing things at this time, but i'm just pointing them out in hopes that some enterprising community member gets to it before GG or I do. =)

There is currently 14 render target changes that occur per-frame if the glow buffer is enabled. nVidia used to recommend "keep them in single digits" a few years ago, but even now 14 seems excessive for the simple functionality of the current GlowBuffer.

I see simple fixes like...

void GlowBuffer::clear()
{
   GFX->startActiveRenderSurface( mSurface[2] );
   GFX->clear( GFXClearTarget, ColorI(0,0,0,0), 1.0f, 0 );
   GFX->endActiveRenderSurface();
}

// Later in SceneState::renderCurrentImages()...
glowBuffer->clear();
glowBuffer->setAsRenderTarget();

As you can see that is 3 SRT calls right there and no work really got done other than a clear. It seems that we should have a GlowBuffer::start() that does the clear and set and a GlowBuffer::end() which does the copy.

Next the actual GlowBuffer::blur() call does this:

// PASS 1
      //-------------------------------
      setupPixelOffsets( Point4F( 3.5, 2.5, 1.5, 0.5 ), true );

      GFX->setTexture( 0, mSurface[0] );
      GFX->setTexture( 1, mSurface[0] );
      GFX->setTexture( 2, mSurface[0] );
      GFX->setTexture( 3, mSurface[0] );
      
      GFX->startActiveRenderSurface( mSurface[1] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
      GFX->endActiveRenderSurface();
     
      
      // PASS 2
      //-------------------------------
      setupPixelOffsets( Point4F( -3.5, -2.5, -1.5, -0.5 ), true );

      GFX->setTexture( 0, mSurface[1] );
      GFX->setTexture( 1, mSurface[1] );
      GFX->setTexture( 2, mSurface[1] );
      GFX->setTexture( 3, mSurface[1] );
      
      GFX->startActiveRenderSurface( mSurface[0] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
      GFX->endActiveRenderSurface();

      // ....  PASS 3 and 4

endActiveRenderSurface() restores the previous render target from the stack, so sandwiching the renders between these start/end calls results in 4 extra changes per frame.

Finally the DX samples and the Tron 2.0 article talk about doing the blur in two passes... one for vertical and one for horizontal. I haven't looked at the shader used, but that saves you 2 more changes there.

About the author

Tom is a programmer and co-owner of Sickhead Games, LLC.


#1
08/22/2005 (6:13 pm)
Setting render targets is still pretty slow in terms of slow things a card can do. However, you can do a LOT more of them with modern cards. Still, you are correct, there are way more changes going on here than necessary.

I forgot the stack kicked in here and restores the previous target, that's real bad in this case. Might be best to add a flag to start/endActiveRenderSurface that ignores the stack.

I actually implemented a 2 pass blur setup, but had to use the four pass to support 1.1 hardware. When I ran some tests, the 4 pass 1.1 setup was faster than the 2 pass 2.0 setup. Because of that, I kept it simple with the 4 pass.
#2
08/22/2005 (6:46 pm)
I was considering what needed to change in the start/endActiveRenderSurface this weekend. First it might be better to rename these to push/popActiveRenderSurface so that it is more apparent to the user what's happening under the hood. The pop/endActiveRenderSurface should probably have a parameter of how many surfaces to pop off the stack. In this way you could do this:

GFX->pushActiveRenderSurface( mSurface[1] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );

      GFX->pushActiveRenderSurface( mSurface[0] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );

      GFX->pushActiveRenderSurface( mSurface[1] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );

      GFX->pushActiveRenderSurface( mSurface[0] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );

      GFX->popActiveRenderSurface( 4 ); // or -1 to pop all and restore backbuffer

I'm surprised that the 4 pass shader was faster than the 2 pass considering there are 4 more SetRenderTarget changes involved.
#3
08/23/2005 (5:40 pm)
The 4pass vs 2pass test case was before I added the RT stack, so it was only 2 more rt changes.

How about push/popActiveRTSurface() as a separate set of functions to startActiveRenderSurface?


then you could do:

GFX->pushActiveRenderSurface();

      GFX->startActiveRenderSurface( mSurface[1] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
      GFX->endActiveRenderSurface();

      GFX->startActiveRenderSurface( mSurface[0] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
      GFX->endActiveRenderSurface();

      GFX->startActiveRenderSurface( mSurface[1] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
      GFX->endActiveRenderSurface();

      GFX->startActiveRenderSurface( mSurface[0] );
      GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
      GFX->endActiveRenderSurface();

      GFX->popActiveRenderSurface(); // or -1 to pop all and restore backbuffer


In this case assuming that you even wanted to preserve the contents of the current render target.
#4
08/23/2005 (6:02 pm)
It seems to me that with this sort of API the endActiveRenderSurface() wouldn't have anything to do as you don't want it calling SRT. It would be more like:

GFX->pushActiveRenderSurface();
GFX->setActiveRenderSurface( mSurface[1] );
GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
GFX->setActiveRenderSurface( mSurface[0] );
GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
GFX->setActiveRenderSurface( mSurface[1] );
GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
GFX->setActiveRenderSurface( mSurface[0] );
GFX->drawPrimitive( GFXTriangleFan, 0, 2 );
GFX->popActiveRenderSurface();

This is unless there is some extra work that needs to be put into endActiveRenderSurface() for non-Win32 D3D APIs.
#5
08/24/2005 (6:19 pm)
EndActiveRenderSurface is there to end MRTs. You'd need it if you were rendering to 2+ targets, and then wanted to render to just one.

I guess setActiveRenderSurface could set the RT to NULL for a particular index to end it like D3D does.
#6
08/24/2005 (11:09 pm)
I haven't used MRT on a project yet, but using NULL to remove it makes sense and is the same technique you use for other parts of the GFX layer like SetTexture():

GFX->pushActiveRenderSurfaces(); // note the plural
GFX->setActiveRenderSurface( mSurface1 ); // default is RT 0
GFX->setActiveRenderSurface( mSurface2, 1 );
// do something useful with MRT
GFX->setActiveRenderSurface( mSurface3, 0 );
GFX->setActiveRenderSurface( NULL, 1 );
// do something else useful with a single RT
GFX->popActiveRenderSurfaces(); // restores whatever was pushed above.
#7
08/24/2005 (11:16 pm)
Lol... I need more knowledge(this confuses me)!
#8
08/29/2005 (1:27 pm)
@Tom - OK, I think we've got it then, that looks good to me.