Game Development Community

A better Interior::setupActivePolyList()

by asmaloney (Andy) · in Torque Game Engine · 01/14/2007 (3:56 pm) · 41 replies

In my continuing quest for a Stronghold which runs at a reasonable speed on my PowerBook G4, I've improved Interior::setupActivePolyList() which was near the top of my profile as a heavy function. The changes should be useful for everyone so I'm posting them here instead of in the Mac forum.

Interior::setupActivePolyList() is a fairly long function. To simplify profiling and to allow the compiler a better chance to optimize at the function level, I added a new function [doFogActive()] to separate out the cases when we have fog to deal with.

The main change though is the way activePoints is handled. This is used to mark points we've already visited so we don't have to do calculations on them again. In the original version, it was an array of U8s and was allocated as part of the frame using FrameAllocator::alloc(). The two problems with this are (1) activePoints is only used in this function, so there's no need to add it as part of the frame memory and (2) indexing into an array of U8s just for a 0-or-1 comparison is quite expensive because of the alignment of the data. So this version replaces activePoints with a bit vector and checks bits instead.

The initial version accounted for 8% of my run time on my profile journal. These changes reduced it to 6%.

If you try it, please let me know what kind of results you get [I'm interested in how it affects the Windows build too]. If you have suggestions for further improvement - I'm sure there are other things to do here - please post!

Aside: One of the things I see in profiling is that some classes/structs are not aligned. I think this is the cause of a lot of inefficient loads and stores, but some class/structs seem to be sensitive to their data layout, so I'm going to have to look at this more carefully.

-----

In interior/interior.h around line 708 add the function header for doFogActive():
void traverseZone(const RectD* inRects, const U32 numInputRects, U32 currZone, Vector<U32>& zoneStack);
[b]   void	doFogActive( bool environmentActive,
								SceneState* state,
								U32 outputCount, U16* output,
								U16* planeSides,
								const PlaneF &distPlane,
								const F32 distOffset,
								const Point3F &worldP,
								const Point3F& osZVec,
								const F32 worldZ );
[/b]
   void setupActivePolyList(ZoneVisDeterminer&, SceneState*,
                            const Point3F&, const Point3F& rViewVector,
                            const Point3F&,
                            const F32 worldz, const Point3F& scale);

continued...

[Edit: include no longer needed]
Page «Previous 1 2 3 Last »
#1
01/14/2007 (3:57 pm)
In interior/interior.cc, replace the function Interior::setupActivePolyList() with this:
void Interior::setupActivePolyList(ZoneVisDeterminer& zoneDeterminer,
                                   SceneState*        state,
                                   const Point3F&     rPoint,
                                   const Point3F&     osCamVector,
                                   const Point3F&     osZVec,
                                   const F32          worldZ,
                                   const Point3F&     scale)
{
   PROFILE_START(InteriorSetupActivePolyList);

   // Here's the deal.  We loop through each of the zones, and create a merged master
   //  list of polygons that are the union of all the zones render sets.  While we're
   //  doing this, we'll be setting up each zone's list of planes.  I've got these
   //  processes separated out for now, but they could be merged.  After we have the
   //  master list of polys, and the list of active planes, we'll copy the list
   //  (culling the backfaces) into the ActivePolyList.

   // There's some trickiness here.  We use the high bit of this U16 to test against the
   //  flip bit in the surfaces planeindex.
   U16* planeSides = (U16*)FrameAllocator::alloc(sizeof(U16) * mPlanes.size());

   // We'll never need more than twice the number of zones for merging...
   ItrMergeStruct* mergeArray = (ItrMergeStruct*)FrameAllocator::alloc((mZones.size() * 2) * sizeof(ItrMergeStruct));
   U32 numMergeStructs = 0;

   PROFILE_START(ISAPL_Merge);
   for (U32 i = 0; i < mZones.size(); i++)
   {
      if (zoneDeterminer.isZoneVisible(i) == false)
         continue;

      // Setup the plane directionals
      for (U32 j = mZones[ i ].planeStart; j < mZones[ i ].planeStart + mZones[ i ].planeCount; j++) {
         if (getPlane(mZonePlanes[j]).distToPlane(rPoint) >= 0.0f)
            planeSides[mZonePlanes[j]] = 0x8000;
         else
            planeSides[mZonePlanes[j]] = 0x0000;
      }

      // Create a merge struct for this zone
      ItrMergeStruct& rMerge = mergeArray[numMergeStructs++];
      rMerge.array = &mZoneSurfaces[mZones[ i ].surfaceStart];
      rMerge.size  = mZones[ i ].surfaceCount;
   }
   AssertFatal(numMergeStructs > 0, "Error, no rendered zones, big problem.");

   // Merge the arrays into the final version
   U32 finalArray = 0xFFFFFFFF;
   {
      U32 begin  = 0;
      U32 end    = numMergeStructs;
      while ((end - begin) > 1)
      {
         U32 newEnd = end;

         U32 i;
         for (i = begin; (i + 1) < end; i += 2)
         {
            const ItrMergeStruct& rMerge0 = mergeArray[ i ];
            const ItrMergeStruct& rMerge1 = mergeArray[i+1];

            // Create the new structure to merge into
            ItrMergeStruct& rMergeOut = mergeArray[newEnd++];
            rMergeOut.array = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));

            mergeSurfaceVectors(rMerge0.array, rMerge0.size,
                                rMerge1.array, rMerge1.size,
                                rMergeOut.array,
                                &rMergeOut.size);
         }

         begin = i;
         end   = newEnd;
      }

      finalArray = begin;
   }
   AssertFatal(finalArray < mZones.size() * 2, "Error, final array out of bounds!");

   U16* output     = mergeArray[finalArray].array;
   U32 outputCount = mergeArray[finalArray].size;
   PROFILE_END();

   // Before we go and fog this object, we'll test the points of our bounding box.  If
   //  they are all unfogged, then we have no need to do any fogging.  If there are
   //  all fogged though, we cannot turn off rendering of the object, as it's possible
   //  that they extend into fog planes.
   PlaneF distPlane;

   // Setup the dist plane
   const Point3F &closest = getBoundingBox().getClosestPoint(rPoint);
   Point3F n = ( closest - rPoint );
   n.convolve( scale );

   const F32	distOffset = n.len();
   if (distOffset != 0)
   {
      distPlane.set(closest, n);
   }
   else
   {
      // Oops, we're inside the bounding box.  distnormal is the view vector in object space
      distPlane.set(closest, osCamVector);
   }
   distPlane.x /= scale.x;
   distPlane.y /= scale.y;
   distPlane.z /= scale.z;

   const Point3F &worldP = state->getCameraPosition();

   F32 maxFog = -1;
   const Point3F fp[2] = { getBoundingBox().min, getBoundingBox().max };

   for (U32 i = 0; i < 8; i++)
   {
      Point3F test;

      if (i & 0x1) test.x = fp[0].x;
      else         test.x = fp[1].x;
      if (i & 0x2) test.y = fp[0].y;
      else         test.y = fp[1].y;
      if (i & 0x4) test.z = fp[0].z;
      else         test.z = fp[1].z;

      F32 hazeVal  = state->getHazeAndFog(mFabs(distPlane.distToPlane(test)) + distOffset,
                                          (mDot(test, osZVec) + worldZ) - worldP.z);
      if (hazeVal > maxFog)
         maxFog = hazeVal;
   }

   PROFILE_START(ISAPL_Setup);

	bool environmentActive = (dglDoesSupportARBMultitexture() &&
                             smRenderEnvironmentMaps &&
                             mValidEnvironMaps != 0);

	if (maxFog >= 1.0/255.0f)
	{
		// Sigh.  Gotta do it
		sgFogActive = true;
		
		doFogActive( environmentActive,
						state,
						outputCount, output,
						planeSides,
						distPlane,
						distOffset,
						worldP,
						osZVec,
						worldZ );
		
		PROFILE_END();
		PROFILE_END();
		return;
	}

	// Unfogged.  We can turn off this part of the setup...
	sgFogActive = false;

   // Point setup
   sgActivePolyList  = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(16));
   sgEnvironPolyList = NULL;
   sgActivePolyListSize  = 0;
   sgEnvironPolyListSize = 0;
   sgFogPolyList = NULL;
   sgFogTexCoords = NULL;
   sgFogPolyListSize     = 0;

	// No Fog
	if (environmentActive)
	{
     sgEnvironPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
	 
	 // Environ
	 for (U32 i = 0; i < outputCount; i++)
	 {
		const U16		oIndex = output[ i ];
		const Surface&	rSurface = mSurfaces[oIndex];
		
		// Not back faced?  Add it to the list
		if (((planeSides[getPlaneIndex(rSurface.planeIndex)] ^ rSurface.planeIndex) & 0x8000) == 0)
		   continue;
		   
		sgActivePolyList[sgActivePolyListSize++] = oIndex;

		if (mEnvironMaps[mSurfaces[output[ i ]].textureIndex] != NULL)
		   sgEnvironPolyList[sgEnvironPolyListSize++] = oIndex;
	 }
	}
	else
	{
	 for (U32 i = 0; i < outputCount; i++)
	 {
		const U16		oIndex = output[ i ];
		const Surface&	rSurface = mSurfaces[oIndex];
		
		// Not back faced?  Add it to the list
		if (((planeSides[getPlaneIndex(rSurface.planeIndex)] ^ rSurface.planeIndex) & 0x8000) == 0)
		   continue;
		   
		sgActivePolyList[sgActivePolyListSize++] = oIndex;
	 }
	}

   PROFILE_END();
   PROFILE_END();
}

continued...
#2
01/14/2007 (3:58 pm)
In interior/interior.cc, add this function:
void	Interior::doFogActive( bool environmentActive,
								SceneState* state,
								U32 outputCount, U16* output,
								U16* planeSides,
								const PlaneF &distPlane,
								const F32 distOffset,
								const Point3F &worldP,
								const Point3F& osZVec,
								const F32 worldZ )
{
	// Point setup
	BitVector	activePoints( mPoints.size() );
	activePoints.clear();

	sgActivePolyList  = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
	sgEnvironPolyList = NULL;
	sgActivePolyListSize = 0;
	sgEnvironPolyListSize = 0;
	sgFogPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
	sgFogTexCoords = (Point2F*)FrameAllocator::alloc(mPoints.size() * sizeof(Point2F));
	sgFogPolyListSize = 0;

	if (useFogCoord())
	{
		if (environmentActive)
		{
			// Environ, fc fog
			sgEnvironPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));

			for (U32 i = 0; i < outputCount; i++)
			{
				const U16		oIndex = output[i];
				const Surface& rSurface = mSurfaces[oIndex];

				// Not back faced?  Add it to the list
				if (((planeSides[getPlaneIndex(rSurface.planeIndex)] ^ rSurface.planeIndex) & 0x8000) == 0)
					continue;

				sgActivePolyList[sgActivePolyListSize++] = oIndex;

				if (mEnvironMaps[rSurface.textureIndex] != NULL)
					sgEnvironPolyList[sgEnvironPolyListSize++] = oIndex;

				const U32	count = rSurface.windingStart + rSurface.windingCount;
				for (U32 j = rSurface.windingStart; j < count; j++)
				{
					const U32 index = mWindings[j];

					if ( !activePoints.test( index ) )
					{
						activePoints.set( index );

						const Point3F	&point = mPoints[index].point;
						const F32 z = worldZ + mDot(point, osZVec);

						mPoints[index].ogCoord = state->getHazeAndFog(mFabs(distPlane.distToPlane(point)) + distOffset,
															z - worldP.z);

						AssertFatal(mPoints[index].fogCoord >= 0.0f, "Error, neg fog coord!");
					}
				}
			}
		}
		else
		{
			// No environ, FC fog
			for (U32 i = 0; i < outputCount; i++)
			{
				const U16		oIndex = output[i];
				const Surface& rSurface = mSurfaces[oIndex];

				// Not back faced?  Add it to the list
				if (((planeSides[getPlaneIndex(rSurface.planeIndex)] ^ rSurface.planeIndex) & 0x8000) == 0)
					continue;

				sgActivePolyList[sgActivePolyListSize++] = oIndex;

				const U32	count = rSurface.windingStart + rSurface.windingCount;
				for (U32 j = rSurface.windingStart; j < count; j++)
				{
					const U32 index = mWindings[j];

					if ( !activePoints.test( index ) )
					{
						activePoints.set( index );

						const Point3F	&point = mPoints[index].point;
						const F32 z = worldZ + mDot(point, osZVec);

						mPoints[index].fgCoord = state->getHazeAndFog(mFabs(distPlane.distToPlane(point)) + distOffset,
																z - worldP.z);

						AssertFatal(mPoints[index].fogCoord >= 0.0f, "Error, neg fog coord!");
					}
				}
			}
		}

		return;
	}

	// Environment, textured fog
	if (environmentActive)
	{
		sgEnvironPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
		
		for (U32 i = 0; i < outputCount; i++)
		{
			const U16		oIndex = output[i];
			const Surface&	rSurface = mSurfaces[oIndex];

			// Not back faced?  Add it to the list
			if (((planeSides[getPlaneIndex(rSurface.planeIndex)] ^ rSurface.planeIndex) & 0x8000) == 0)
				continue;
			
			sgActivePolyList[sgActivePolyListSize++] = oIndex;

			if (mEnvironMaps[rSurface.textureIndex] != NULL)
				sgEnvironPolyList[sgEnvironPolyListSize++] = oIndex;

			// Fog the unfogged points...
			const U32	count = rSurface.windingStart + rSurface.windingCount;
			for (U32 j = rSurface.windingStart; j < count; j++)
			{
				const U32 pIndex = mWindings[j];

				if ( !activePoints.test( pIndex ) )
				{
					activePoints.set( pIndex );

					// fog.
					const Point3F	&point = mPoints[pIndex].point;
					const F32 dist = distPlane.distToPlane(point) + distOffset;
					const F32 z = worldZ + mDot(point, osZVec);

					Point2F	&fogTexCoord = sgFogTexCoords[pIndex];
					gClientSceneGraph->getFogCoordPair(dist, z, fogTexCoord.x, fogTexCoord.y);
				}
			}
		}
	}
	else
	{
		// no environment, textured fog
		for (U32 i = 0; i < outputCount; i++)
		{
			const U16	oIndex = output[i];

			const Surface& rSurface = mSurfaces[oIndex];

			// Not back faced?  Add it to the list
			if (((planeSides[getPlaneIndex(rSurface.planeIndex)] ^ rSurface.planeIndex) & 0x8000) == 0)
				continue;

			sgActivePolyList[sgActivePolyListSize++] = oIndex;

			// Fog the unfogged points...
			const U32	count = rSurface.windingStart + rSurface.windingCount;
			for (U32 j = rSurface.windingStart; j < count; j++)
			{
				const U32 pIndex = mWindings[j];

				if ( !activePoints.test( pIndex ) )
				{
					activePoints.set( pIndex );

					// fog.
					const Point3F	&point = mPoints[pIndex].point;
					const F32 dist = distPlane.distToPlane(point) + distOffset;
					const F32 z = worldZ + mDot(point, osZVec);

					Point2F	&fogTexCoord = sgFogTexCoords[pIndex];
					gClientSceneGraph->getFogCoordPair(dist, z, fogTexCoord.x, fogTexCoord.y);
				}
			}
		}
	}
}

[Edit: Ben's suggested change BitMatrix -> Bitvector]
#3
01/14/2007 (11:29 pm)
Thanks for the write-up here! I have a few questions based on reading over this code...

1. How much time is spent calculating envmap or fog values? The calculations involved are quite simplistic - it might be possible to vectorize these calculations, and if they are a big chunk of the work then it might be smart to consider it.

2. Any reason you chose BitMatrix? I believe there's a BitVector you can use, which should be a bit (no pun intended :P) faster than the BitMatrix, since it doesn't consider the second dimension.

3. This code looks like it might be memory bound. Would it be possible to get a win out of prefetching? Does Shark show any memory stalls? If so, where?

4. Depending on what you're targeting, have you considered generating a fixed vertex buffer for each zone in the interior, and simply drawing that (potentially doing fog/envmap calculations in CPU) than doing all this dynamic stuff? We do this in TSE to good effect.
#4
01/15/2007 (6:22 am)
Ben: Thanks for the input!

I should have stated my goals a bit [haha - I can do it too!] better. I'm trying to make some general, non-radical changes to the stock TGE to make it run better on my PB G4. I'm using this in part to learn the internals of TGE, so I don't want to make any sweeping changes. I would also like others to benefit, so I want to keep the changes as simple as possible while still providing some improvements. I'm hoping that this will encourage others to jump in and learn, and that you [GG] might consider my changes [or something like them] for the main code base since they're easy to understand [no Duff's Device here], generally applicable, and non-threatening :-)

To answer your questions:

1) About a third - see 3. I had thought of vectorizing as well, which might be a bit easier to tackle now that the fog stuff is in a separate function.

2) Just my inexperience with TGE. :-) [I've changed the code for doFogActive() above...]

3) Yes it is stalling all over getFogCoordPair(). The compiler cannot optimize this because we're using a global, non-const object [gClientSceneGraph]. Manually inlining and moving invariants outside the loops would probably help here. I'll see what happens.

4) Honestly, I don't understand TGE enough to do this yet :-)
#5
01/15/2007 (10:16 am)
1&3) Hmm - well I'd try breaking that function out. It's pretty simple code - should be easy to break it out and test. If it's effective might be worthwhile making a FogCalculator class that lives in the stack and does the calcs - that might be a clean way to get the compiler win w/o uglifying your code.

2) Was there much of a win from that change? (We _are_ opimizing after all... always gotta measure changes to see if they do anything! :P)

4) It's a pretty big set of changes, and as you say, isn't really inline with your goals - makes sense to me to leave it to TSE. (Besides, that way we sell more TSE licenses... :P) And this code _is_ smarter for slow GPU/fast CPU situations.

Thanks for stepping up to the plate and hacking on this stuff... great work so far!
#6
01/15/2007 (3:41 pm)
Thanks Ben - appreciate the feedback!

1) I'll revisit this when I get a bit more time to look at it. I keep getting distracted every time I zone-in on this ;-)

2) Reduced time spent in doFogActive() from 6.0% to 5.9%

4) Just need TSE *cough* TGEA support on Mac OS X... :-)
#7
01/15/2007 (4:33 pm)
1) I think it sounds like a smart move... look forward to hearing what sort of win this is for you! :)

2) Hah - well, good to know there WAS a change!

4) We're closer than ever... :)
#8
01/16/2007 (12:30 pm)
I made the following three changes:

1) Added a fog calculator class for textured fog calculations based on Ben's suggestion [at least I think I'm doing what he suggested :-)]
2) Manually inlined and jiggered the textured fog calculation code within that class [uglified!]
3) Moved the planeSides() checks out of the inner loops all the way back to the creation of the mergeArray to reduce the inner loop iterations

...and the result: 2.3%. From the original 8%, that makes a nice little optimization.

There are still some stalls in the fog calc and register spills, so it can be improved even more, but I'm going to leave it as is for now.

Code to follow...

In interior/interior.h around line 708 add the function header for doFogActive():
void	doFogActive( const bool environmentActive,
						const SceneState* state,
						const U32 mergeArrayCount, const U16* mergeArray,
						const PlaneF &distPlane,
						const F32 distOffset,
						const Point3F &worldP,
						const Point3F &osZVec,
						const F32 worldZ );
Continued...
#9
01/16/2007 (12:31 pm)
In interior/interior.cc, replace the function Interior::setupActivePolyList() with this:
void Interior::setupActivePolyList(ZoneVisDeterminer& zoneDeterminer,
                                   SceneState*        state,
                                   const Point3F&     rPoint,
                                   const Point3F&     osCamVector,
                                   const Point3F&     osZVec,
                                   const F32          worldZ,
                                   const Point3F&     scale)
{
   PROFILE_START(InteriorSetupActivePolyList);

   // Here's the deal.  We loop through each of the zones, and create a merged master
   //  list of polygons that are the union of all the zones render sets.  While we're
   //  doing this, we'll be setting up each zone's list of planes.  I've got these
   //  processes separated out for now, but they could be merged.  After we have the
   //  master list of polys, and the list of active planes, we'll copy the list
   //  (culling the backfaces) into the ActivePolyList.

   // There's some trickiness here.  We use the high bit of this U16 to test against the
   //  flip bit in the surfaces planeindex.
   U16* planeSides = (U16*)FrameAllocator::alloc(sizeof(U16) * mPlanes.size());

   // We'll never need more than twice the number of zones for merging...
   ItrMergeStruct* mergeArray = (ItrMergeStruct*)FrameAllocator::alloc((mZones.size() * 2) * sizeof(ItrMergeStruct));
   U32 numMergeStructs = 0;

   PROFILE_START(ISAPL_Merge);
   for (U32 i = 0; i < mZones.size(); i++)
   {
      if (zoneDeterminer.isZoneVisible(i) == false)
         continue;

      // Setup the plane directionals
      for (U32 j = mZones[ i ].planeStart; j < mZones[ i ].planeStart + mZones[ i ].planeCount; j++) {
         if (getPlane(mZonePlanes[j]).distToPlane(rPoint) >= 0.0f)
            planeSides[mZonePlanes[j]] = 0x8000;
         else
            planeSides[mZonePlanes[j]] = 0x0000;
      }

      // Create a merge struct for this zone
      ItrMergeStruct& rMerge = mergeArray[numMergeStructs++];
      rMerge.size  = mZones[ i ].surfaceCount;
      rMerge.array = (U16*)FrameAllocator::alloc(rMerge.size * sizeof(U16));
      dMemcpy( rMerge.array, &mZoneSurfaces[mZones[ i ].surfaceStart], rMerge.size * sizeof(U16) );
   }
   AssertFatal(numMergeStructs > 0, "Error, no rendered zones, big problem.");

   // Merge the arrays into the final version
   U32 finalArray = 0xFFFFFFFF;
   {
      U32 begin  = 0;
      U32 end    = numMergeStructs;
      while ((end - begin) > 1)
      {
         U32 newEnd = end;

         U32 i;
         for (i = begin; (i + 1) < end; i += 2)
         {
            const ItrMergeStruct& rMerge0 = mergeArray[ i ];
            const ItrMergeStruct& rMerge1 = mergeArray[i+1];

            // Create the new structure to merge into
            ItrMergeStruct& rMergeOut = mergeArray[newEnd++];
            rMergeOut.array = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));

            mergeSurfaceVectors(rMerge0.array, rMerge0.size,
                                rMerge1.array, rMerge1.size,
                                rMergeOut.array,
                                &rMergeOut.size);
         }

         begin = i;
         end   = newEnd;
      }

      finalArray = begin;
   }
   AssertFatal(finalArray < mZones.size() * 2, "Error, final array out of bounds!");

   U16* output     = mergeArray[finalArray].array;
   U32 outputCount = mergeArray[finalArray].size;
   U32	pos = 0;
   
   // remove back faced polys from list
	for (U32 i = 0; i < outputCount; i++)
	{
		const U16	oIndex = output[ i ];
		const U16	rSurfacePlaneIndex = mSurfaces[oIndex].planeIndex;
			
		if ( (planeSides[getPlaneIndex(rSurfacePlaneIndex)] ^ rSurfacePlaneIndex) & 0x8000 )
			output[pos++] = oIndex;			
	}
	
	outputCount = pos;
	
   PROFILE_END();

   // Before we go and fog this object, we'll test the points of our bounding box.  If
   //  they are all unfogged, then we have no need to do any fogging.  If there are
   //  all fogged though, we cannot turn off rendering of the object, as it's possible
   //  that they extend into fog planes.
   PlaneF distPlane;

   // Setup the dist plane
   const Point3F &closest = getBoundingBox().getClosestPoint(rPoint);
   Point3F n = ( closest - rPoint );
   n.convolve( scale );

   const F32	distOffset = n.len();
   if (distOffset != 0)
   {
      distPlane.set(closest, n);
   }
   else
   {
      // Oops, we're inside the bounding box.  distnormal is the view vector in object space
      distPlane.set(closest, osCamVector);
   }
   distPlane.x /= scale.x;
   distPlane.y /= scale.y;
   distPlane.z /= scale.z;

   const Point3F &worldP = state->getCameraPosition();

   F32 maxFog = -1;
   const Point3F fp[2] = { getBoundingBox().min, getBoundingBox().max };

   for (U32 i = 0; i < 8; i++)
   {
      Point3F test;

      if (i & 0x1) test.x = fp[0].x;
      else         test.x = fp[1].x;
      if (i & 0x2) test.y = fp[0].y;
      else         test.y = fp[1].y;
      if (i & 0x4) test.z = fp[0].z;
      else         test.z = fp[1].z;

      F32 hazeVal  = state->getHazeAndFog(mFabs(distPlane.distToPlane(test)) + distOffset,
                                          (mDot(test, osZVec) + worldZ) - worldP.z);
      if (hazeVal > maxFog)
         maxFog = hazeVal;
   }

   PROFILE_START(ISAPL_Setup);

	bool environmentActive = (dglDoesSupportARBMultitexture() &&
                             smRenderEnvironmentMaps &&
                             mValidEnvironMaps != 0);

	if (maxFog >= 1.0/255.0f)
	{
		// Sigh.  Gotta do it
		sgFogActive = true;
		
		doFogActive( environmentActive,
						state,
						outputCount, output,
						distPlane,
						distOffset,
						worldP,
						osZVec,
						worldZ );
		
		PROFILE_END();
		PROFILE_END();
		return;
	}

	// Unfogged.  We can turn off this part of the setup...
	sgFogActive = false;

   // Point setup
   sgActivePolyList  = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(16));
   sgEnvironPolyList = NULL;
   sgActivePolyListSize  = 0;
   sgEnvironPolyListSize = 0;
   sgFogPolyList = NULL;
   sgFogTexCoords = NULL;
   sgFogPolyListSize     = 0;

	// No Fog
	if (environmentActive)
	{
     sgEnvironPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
	 
	 // Environ
	 for (U32 i = 0; i < outputCount; i++)
	 {
		const U16		oIndex = output[ i ];
		   
		sgActivePolyList[sgActivePolyListSize++] = oIndex;

		const Surface&	rSurface = mSurfaces[oIndex];

		if (mEnvironMaps[rSurface.textureIndex] != NULL)
		   sgEnvironPolyList[sgEnvironPolyListSize++] = oIndex;
	 }
	}
	else
	{
	 for (U32 i = 0; i < outputCount; i++)
	 {
		sgActivePolyList[sgActivePolyListSize++] = output[ i ];
	 }
	}

   PROFILE_END();
   PROFILE_END();
}
Continued...
#10
01/16/2007 (12:33 pm)
In interior/interior.cc, add this:
// This class is a fancy inlined implementation of the fog calculations with all invariants moved
//	into the constructor.
class FogCalc
{
	public:
		FogCalc::FogCalc( const PlaneF &dPlane, F32 distOffset, const Point3F &zVec, F32 worldZ, F32 worldPz, const SceneState *state )
		:	fc_distOffset( distOffset ),
			fc_newWorldZ( worldZ - worldPz ),
			distPlane( dPlane ),
			osZVec( zVec ),
			sState( state )
		{
			// For Textured
			F32 heightOffset;

			gClientSceneGraph->getFogCoordData( tex_invVisibleDistance, heightOffset, tex_invHeightRange );

			tex_visibleDistanceMod = gClientSceneGraph->getVisibleDistanceMod() - distOffset - distPlane.d;
			tex_newWorldZ = worldZ - heightOffset;
		}
		
		inline F32	CalcFC( const Point3F &point ) const
		{
			return( sState->getHazeAndFog( mFabs( distPlane.distToPlane(point) ) + fc_distOffset,
														fc_newWorldZ + mDot(point, osZVec) ) );
		}
		
		inline const Point2F	CalcTextured( const Point3F &point ) const
		{
			return( Point2F(
						(tex_visibleDistanceMod - (point.x * distPlane.x + point.y * distPlane.y + point.z * distPlane.z)) * tex_invVisibleDistance,
						(tex_newWorldZ + point.x * osZVec.x + point.y * osZVec.y + point.z * osZVec.z) * tex_invHeightRange
					) );
		}
	
	private:
		const F32	fc_distOffset;
		const F32	fc_newWorldZ;
		
		F32	tex_invVisibleDistance;
		F32 tex_invHeightRange;
		F32	tex_visibleDistanceMod;
		F32	tex_newWorldZ;
		
		const PlaneF distPlane;
		const Point3F osZVec;
		const SceneState *sState;
};


void	Interior::doFogActive( const bool environmentActive,
								const SceneState* state,
								const U32 mergeArrayCount, const U16* mergeArray,
								const PlaneF &distPlane,
								const F32 distOffset,
								const Point3F &worldP,
								const Point3F &osZVec,
								const F32 worldZ )
{
	// Point setup
	BitVector	activePoints( mPoints.size() );
	activePoints.clear();

	sgActivePolyList  = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
	sgEnvironPolyList = NULL;
	sgActivePolyListSize = 0;
	sgEnvironPolyListSize = 0;
	sgFogPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
	sgFogTexCoords = (Point2F*)FrameAllocator::alloc(mPoints.size() * sizeof(Point2F));
	sgFogPolyListSize = 0;

	const FogCalc	fogCalc( distPlane, distOffset, osZVec, worldZ, worldP.z, state );

	if (useFogCoord())
	{
		// FC fog
		for (U32 i = 0; i < mergeArrayCount; ++i)
		{
			const U16		oIndex = mergeArray[ i ];

			sgActivePolyList[sgActivePolyListSize++] = oIndex;

			const Surface& rSurface = mSurfaces[oIndex];

			if (environmentActive && (mEnvironMaps[rSurface.textureIndex] != NULL))
			{
				if ( sgEnvironPolyList == NULL )
					sgEnvironPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
					
				sgEnvironPolyList[sgEnvironPolyListSize++] = oIndex;
			}

			// Fog the unfogged points...
			const U32	count = rSurface.windingStart + rSurface.windingCount;
			
			for (U32 j = rSurface.windingStart; j < count; ++j)
			{
				const U32 index = mWindings[j];

				if ( !activePoints.test( index ) )
				{
					activePoints.set( index );

					mPoints[index].fogCoord = fogCalc.CalcFC( mPoints[index].point );

					AssertFatal(mPoints[index].fogCoord >= 0.0f, "Error, neg fog coord!");
				}
			}
		}

		return;
	}
	
	// Textured fog
	for (U32 i = 0; i < mergeArrayCount; ++i)
	{
		const U16		oIndex = mergeArray[ i ];
		
		sgActivePolyList[sgActivePolyListSize++] = oIndex;

		const Surface&	rSurface = mSurfaces[oIndex];

		if (environmentActive && (mEnvironMaps[rSurface.textureIndex] != NULL))
		{
			if ( sgEnvironPolyList == NULL )
				sgEnvironPolyList = (U16*)FrameAllocator::alloc(mSurfaces.size() * sizeof(U16));
				
			sgEnvironPolyList[sgEnvironPolyListSize++] = oIndex;
		}
		
		// Fog the unfogged points...
		const U32	count = rSurface.windingStart + rSurface.windingCount;

		for (U32 j = rSurface.windingStart; j < count; ++j)
		{
			const U32 pIndex = mWindings[j];

			if ( !activePoints.test( pIndex ) )
			{
				activePoints.set( pIndex );
				
				sgFogTexCoords[pIndex] = fogCalc.CalcTextured( mPoints[pIndex].point );
			}
		}
	}
}
End

[Edit: fix sign problem - see below]
#11
01/16/2007 (12:40 pm)
That's a pretty huge difference...can't wait to get home and try this out!
#12
01/16/2007 (1:05 pm)
Nice optimization work.

For the method:
inline const Point2F CalcTextured( const Point3F &point ) const
      {
         return( Point2F(
                  (visibleDistanceMod + point.x * distPlane.x + point.y * distPlane.y + point.z * distPlane.z) * invVisibleDistance,
                  (newWorldZ + point.x * osZVec.x + point.y * osZVec.y + point.z * osZVec.z) * invHeightRange
               ) );
      }

What does the assembly look like? Is GCC actually copying the resultant Point2F around or storing it directly into place? You might consider doing:

inline void CalcTextured( const Point3F &point, Point2F &out ) const
      {
          out.x = (visibleDistanceMod + point.x * distPlane.x + point.y * distPlane.y + point.z * distPlane.z) * invVisibleDistance;
          out.y = (newWorldZ + point.x * osZVec.x + point.y * osZVec.y + point.z * osZVec.z) * invHeightRange;
      }

As this would avoid a copy. But it comes down to what the optimizer is doing. Avoiding memory read/writes in those inner loops is probably going to be the biggest win you can get.

That fog calculator class might be handy elsewhere, too. :) Nice bit of utility coding there.
#13
01/16/2007 (1:10 pm)
Oh yeah - also don't forget to use the fog calculator for all fog calcs in there, it'll make the code much more maintainable (and probably more performant on all code paths).

Great work!
#14
01/16/2007 (2:34 pm)
@Rubes: Let me know how it goes!

@Ben: Thanks! The CalcTextured() change - I tried that and it resulted in quite a slowdown - not quite sure why. There is still some copying going on, but the optimizer seems to be doing well here. I also tried adding a Point2F to the class and just returning a ref to it to avoid any copying, but I didn't gain anything from that.

As to the other fog calculations, I started to do CalcFC(), but ran out of steam. It would just require a couple of additional args to FogCalc's constructor [state and worldP.z] - and some rejiggery.

Oh, now that I look at the code again, I can make doFogActive() a bit simpler...

Oh heck - I'll look at it later - one more round I guess :-)
#15
01/16/2007 (3:52 pm)
Oops. Forgot one piece of code - we need access to some of SceneGraphs protected vars, so we grab them with this function.

In sceneGraph/sceneGraph.h, add this function:

...
   void buildFogTextureSpecial( SceneState *pState );

[b]   void getFogCoordData(F32 &invVisibleDistance, F32 &heightOffset, F32 &invHeightRange) const;[/b]
   void getFogCoordPair(F32 dist, F32 z, F32 &x, F32 &y) const;

...

inline void SceneGraph::getFogCoordData(F32 &invVisibleDistance, F32 &heightOffset, F32 &invHeightRange) const
{
   invVisibleDistance = mInvVisibleDistance;
   heightOffset = mHeightOffset;
   invHeightRange = mInvHeightRange;
}
#16
01/16/2007 (5:19 pm)
Instead of making this into The Longest Thread ever, I've gone back and edited the code with the FogCalc class [six posts up].

1) I added the method for calculating using fog coords called CalcFC()
2) I change doFogActive() to be a lot cleaner and easier to maintain
3) Speed is the same

About fog coord: when I went to test this on my Mac, I found that in game.cc, FC is disabled on the Mac, so even though EXT_fog_coord is available, it is never used in the Mac build. Searching the forums, I found that there have been problems with it in the past. When I enabled it, recompiled, and ran Stronghold, everything seemed to work fine - at the same speed as the textured fog. When I tried to run the lighting demo, it asserted in DepthSortList::depthPartition(). Further investigation required :-)
#17
01/16/2007 (7:26 pm)
@Andy: I added this code and then used journaling with Shark to do some profiling, and compared it with my app prior to adding these edits.

First thing I noticed was that Interior::setupActivePolyList() was way down the list on Shark (0.5%). Next thing was that these changes didn't have any effect on this or anything else.

I'm still spending all my time in Blender::blend_vec().
#18
01/16/2007 (8:11 pm)
@Rubes: What was your journal of? In a complex interior situation these changes should make a difference. If you're zooming around outdoors with the C blender, then it'll get drowned out by the much costlier other activities you're engaging in.
#19
01/16/2007 (8:16 pm)
Sorry, should have been more specific.

It was, in fact, a complex interior situation. My game starts off with the player inside a large DIF structure (though with a few windows/doors to the outside). The journal is basically me spinning around twice, moving into an adjacent room, and spinning around a couple more times.

I suppose the spinning will have a large effect on the processes that are called, but I'm still inside a large, complex DIF the whole time.
#20
01/16/2007 (8:23 pm)
When do you start/end profiling? If you profile the load sequence or the first few frames of the scene, loading time will dominate the profile results.

A more accurate way to profile this code would probably be to compare a fixed view, before/after, or one with no exterior rendering at all, starting and stopping the profiler while in this fixed view. Spinning will make the blender dominate the profiler, minimizing the visibility of any performance gains from this win.
Page «Previous 1 2 3 Last »