Game Development Community

TGEA 1.7.0 Beta 2 Bug - Atlas unique terrain clipmap recentering

by Rene Damm · in Torque Game Engine Advanced · 03/29/2008 (8:30 pm) · 83 replies

I repent. I absolutely should have checked this thing on a pristine Beta 2 before reporting. The problem is actually caused by stuff I have done (probably related to this). This was stupid. Sorry.
Page «Previous 1 2 3 4 5 Last »
#1
04/01/2008 (10:12 am)
That bashing of myself was a little premature: it is a bug :)

To see the problem, load an Atlas unique terrain and turn on debug textures. Then, when you wander around, you will see how the clipmap will frequently just get stuck and not update. Sometimes it will catch up quickly, sometimes it will not do so for a long time, and sometimes I couldn't get it to update again.

(Sidenote: do this with a large texture. The problem might not show up for small ones.)

What is interesting is that whereas level 1 gets completely stuck, other levels may or may not update correctly at the same time.

Aside from the fact that there are other issues in the clipmap code that play into this (see my 50% fixes here), from what I understand so far this may be a problem with I/O in Atlas.

ClipMap::recenter does the main work here. It first instructs the clipmap cache to update to the new center and then iterates over the levels of the clipmap stack to do a rect update on each one. However, when the data for a specific level isn't available, it will skip that level's update. And that's exactly what seems to be a problem. In some cases, the data simply never becomes available.

To test this, I have modified this piece of code to simply block until data is available and while the engine will then frequently hang--often for lengthy periods of time and sometimes forever!!--, the clipmap center indeed will stay synchronized.

This all seems to indicate a problem where data that is requested by the engine never actually arrives (or does so with considerable lag) and that the clipmap will simply stop updating the affecting clipstack levels as a result.

So, will continue searching and hope to have a fix soon.

Would be really glad to maybe get a comment on all this Atlas stuff from someone on the GG staff.
#2
04/01/2008 (9:24 pm)
Hey Rene,
Do you know if this is a good fix or not?

www.garagegames.com/mg/forums/result.thread.php?qt=73251
#3
04/02/2008 (12:55 am)
Rene,
I'm headed to bed (gotta sleep sometime if I want to keep coding efficiently =P) but I am planning on sharing some of my findings on this from the last time I touched the Atlas code (I was dealing with similar issues).

From what I recall, the main issue that causes the behavior you are describing is when ClipMapBlenderCache::setInterestCenter() is using too small of a radius when calling mOpacitySources[i]->setInterestCenter() and mLightmapSources[i]->setInterestCenter().

I'll dig through the logs\changes I made a little more tomorrow. Do you have some test data that you could send me (email or yousendit.com or an ftp)?

Thanks!
#4
04/02/2008 (4:28 am)
Hi Matt,

hey, thanks a bunch for the replies. You really seem to be working your a*s off to get TGEA 1.7 in the best possible shape. Awesome!

The fix you mention above looks suspicious to me (even more so than my own for the clipmap code, haha). What I can say for sure, though, is that the code in question is irrelevant to the unique texturing Atlas stuff. If it's a bug, the effects should be pretty noticable, too.

Hmm, not sure about the interest center thing. Accordinig to my tests, blended Atlas is working fine as far as recentering goes. I am still betting on I/O issues and blended terrain naturally has none of these for texture data.

Promise to send you a test file later this day.

Hope you come back well rested for another round of bug squashing :)
#5
04/02/2008 (1:40 pm)
I haven't had time to try it yet but real quick...in ClipMapUniqueCache::setInterestCenter() try changing:

mUniqueSource->setInterestCenter(origin, mClipMapSize);

to

mUniqueSource->setInterestCenter(origin, mClipMapSize * 2);

As I mentioned above, I made a similar change in ClipMapBlenderCache and it cleared up this issue pretty well so it is worth a shot. Once I get done with a few more tasks today I will be making my own tests against it.
#6
04/02/2008 (1:44 pm)
Already have tried that change before but to no avail, but the fact that you apparently got this issue with blended terrain, too, and that this fixed, get's me thinking! Hmm, will take a deeper look into the radius thing.
#7
04/02/2008 (3:56 pm)
Slowly crawling towards the light...

I can at least pinpoint the first problem with relative certainty. It isn't that ClipMap isn't properly updating (I have seen partial updates with a hung center, though, so there is more to this), but rather than ClipMap::recenter isn't called at all by AtlasInstance::renderObject for long stretches of time. This indicates that there is some problem with the code involved in ray casting against terrain geometry. Somehow frequently neither casting up nor casting down will result in a hit even when it should.
#8
04/02/2008 (6:40 pm)
Okay, have to wrap up for today.

Here are my last two findings:

a) There seems to be a bug in Box3F::collideLine that causes it to return an invalid normal. For all but one code path, the local normal variable is never set and simply passed from its NAN state on to the caller.

I tripped on this when turning on Atlas ray collision debugging which uses the above code and thus returns invalid normal information from AtlasGeomChunkTracer::castLeafRay.

b) There is a bug that causes AtlasGeomChunkTracer to frequently miss on ray collision tests, i.e. produce a negative rather than a correct positive result. As a consequence, whenever this is the case, the clip map will not receive a recenter request even though it should.

---------
PS: Where is Ben Garney? As he's the one who wrote all this (as far as I know), shouldn't he be the one to squash the bugs?

Sorry, I know, that's none of my business, really.
#9
04/03/2008 (9:35 am)
And another tinsy winsy step.

Can trace back the problem to holes in the collision mesh that is produced for an Atlas terrain. Apparently there are triangles missing from it which will then cause AtlasGeomChunkTracer to not find a hit and subsequently make AtlasInstance skip the clip map update.

Man, I should be awarded the plastic cross of the mighty order of the bugger beggars for this...
#10
04/03/2008 (10:45 am)
Ben is *way* busy working on other things these days. We *may* be able to get a little bit of his time in the next couple of weeks but it is my hope that we will have this fixed before then =)

Rene, your efforts are definitely not going unnoticed! We really appreciate you stepping up and helping on this! Hopefully we will be able to return the favor at some point =)
#11
04/03/2008 (11:08 am)
Hey Matt, you can't imagine what you did with that post! I was just getting "a little bit" frustrated with it all. The more I dig into it, the more issues come up while I haven't even successfully fixed those it all started with. This is where you really have to have your blinders on and just focus on one thing.

Thanks for your encouragement, Matt.

Here is a summary of outstanding issues:

a) Atlas shifts mipmap levels for unique textures one level down thus entirely dropping the initial texture and using only blurred versions. (Can't say whether this also affects blended terrains but it all does look awfully blurry to me).

b) Atlas raycasting has issues, i.e. AtlasInstance::castRay is broken. The apparent cause seems to be invalid collision data potentially caused by bugs in the generators.

c) Atlas clip map recentering does not work; primarily due to b) but the complicated raycasting is unnecessary anyways as the clip map code isn't interested at all in a precise point on the terrain but just needs some proper X/Y coordinates.

d) Clip map updating itself is broken due to I/O issues. This becomes apparent with unique texturing where streaming is more relevant than with blending textures. Due to the I/O problems, the clip map will stop updating entirely or do only partial/sporadical updates.

I have that gut feeling in my stomach that there is more buried there...

Will tackle c) first now as it is the easiest one.
#12
04/03/2008 (11:38 am)
@Rene: Is your profile email active? I sent you an email last night, was wondering if you received it.
#13
04/03/2008 (12:21 pm)
@Stephen: Thank you for notifying me. Got filtered out somehow. Have sent you an email back.
#14
04/03/2008 (2:53 pm)
There is a memory leak in AtlasGeomChunk::copyToDiscreteMesh. The following code should be added to the method in order for AtlasDiscreteMesh to properly release the buffers allocated by copyToDiscreteMesh.

adm->mOwnsData = true;

PS: That "not needing full raycast" thing I said above is non-sense, of course. After all, this is Atlas and not some heightfield renderer.
#15
04/03/2008 (4:55 pm)
And again: have to wrap up. Am way overdue...

Status Update:

I have added another ray collision debug level that does the ray intersection tests on the actual mesh rather than the collision mesh. This indeed solves the problem of AtlasInstance not sending recenter requests and thus proves that there is a problem with the collision meshes (quite probably to be found in AtlasGeomChunk::generateCollision. The effect on clip map updates, however, is marginal. The real problem lies somewhere else.

Tomorrow I will start taking the I/O chain apart.
#16
04/04/2008 (10:03 pm)
Hmm... that took me far too long...

The clipmap recentering simply is a resource contention issue. Atlas' deserialization pipeline is WAY too complicated, so what happens is that while there is absolutely no problem streaming the necessary data into main memory, the deserializer will quickly fall behind and wind up amassing an ever larger pending queue. This will cause the clip map to increasingly starve for data until finally, once contention reaches its peak, all its data is outdated and no valid new data comes in.

The true culprit here (aside from the complicated load chain) is that while Atlas has code to cancel *load* requests, it has no code to cancel load operations once they have reached deserialization state, i.e. the deserializer will be busy with requests that are no longer of interest. This also causes Atlas to consume vast amounts of memory (together with a *MASSIVE* memory leak in the async IO code).

So, the next thing is to put in some code to maintain proper hygiene with the deserializer queue. Once I have all the other bugs out, I will turn to simplifying the load chain.

When recentering works, the next thing will be fixing the collision meshes. This should pretty much put an end to clip map update issues.

The good thing about taking so long to find this was that all kinds of other issues turned up, most of which I have fixed by now including the issues it all started with (detail levels dropped and unique texture sizes wrong). Will clean up first and then put all the stuff up here.
#17
04/05/2008 (2:47 am)
Awesome stuff Rene!

As you may have seen we went ahead and launched TGEA 1.7.0 today. I have been super busy this week nailing down all the loose ends for that release which is why I haven't been as responsive to your work as I would like.

I consider the Atlas code the next highest priority (it is the shakiest bit of TGEA 1.7.0) so once I am done with the post-launch fall out I will be turning my attention back this way.

Keep plugging away! It sounds like you are on the right track and all of your hard work will not go unrewarded.

Ben is hoping to make a little time in the near future to come offer some advice. I'll keep poking him with a stick till he does =P
#18
04/05/2008 (2:54 am)
Yep, just saw that 1.7 is out. My bad. Been just too slow :)

However, I am confident that by the time you come back on Monday, I have fixed all the issues I have found so far (except maybe the collision mesh thing but terrain mesh intersection can be used for the time being).

So, have a nice weekend, Matt.

PS: Great job on 1.7!! Absolutely.
#19
04/05/2008 (2:56 am)
Ahh, and thanks for the mention in the release post :)
#20
04/05/2008 (5:01 pm)
Hey Rene, thanks for digging into this! Matt tossed me a link to the thread (I don't actively monitor the GG forums these days).

The deserialization issue you've identified is interesting. What sort of HD do you have? I wrote a lot of the deserialization code on a laptop with a slower drive, and although I tested it on a variety of systems I never saw the IO thread outrunning deserialization (which ought to be really fast since it just does a little decompression and parsing). What sort of texture tile sizes are you using? It could be an issue if the tiles are quite large (thus taking longer to process). Running the profiler on the deserializer would also be helpful; there's not very much happening there to slow things down...

The collision issue is another interesting one. When the collision code was originally written I used a simple debug object that cast a lot of rays in every direction and plotted the results (the "spiky ball" test) and moved it all over the place to check that collision worked properly. Something similar might be helpful in nailing down the problem here. Clearly since then there's been some bit rot. Could it be a memory corruption issue where the collision acceleration structures are being truncated? The rendered geometry and the collision geometry should be identical; there's just a simple quadtree structure that's used to speed up collision checks, very similar to what the Legacy terrain does.

Dropping the most detailed level of the unique texture is bad. Did you ever get to the bottom of that? It's probable that there's an off by one error somewhere in the image source logic. It might have been introduced when Matt fixed blended clipmaps on the Legacy terrain; you could check that by testing on a code drop from before that set of changes.

I'm happy to see you digging into this stuff. Code never really becomes solid until a couple of people have beaten on it. Having Matt look at it has been of great value, and you're definitely giving it a work out too. :)
Page «Previous 1 2 3 4 5 Last »