Game Development Community

TGEA 1.7.0 Beta 2 Bug - Atlas unique terrain clipmap recentering

by Rene Damm · in Torque Game Engine Advanced · 03/29/2008 (8:30 pm) · 83 replies

I repent. I absolutely should have checked this thing on a pristine Beta 2 before reporting. The problem is actually caused by stuff I have done (probably related to this). This was stupid. Sorry.
#21
04/05/2008 (5:34 pm)
Hey Ben, really cool to hear from you! Yep, I am giving your code a good beating :)

The detail level thing actually was the easiest thing. It's simply that the entire mipmap calculation was off by one which got hidden by AtlasInstance dropping the highest mipmap level altogether. I have that fully fixed and working.

There is another aethestic problem coupled to this in that the max clip stack delta is locked to four (no problem in itself) and when coming across a larger delta in a chunk, the code will drop the finer LOD levels rather than the coarser ones. I have that fixed, too.

Your comments on the collision issue are insightful. Will make use of that when I get around to fixing that bug.

As for the deserialization thing, reducing the tile size wil probably help avoiding the issue most of the time. I have used a size of 512x512 and further amplified the problem by putting the whole Atlas system under intense stress (extremely dense mesh and two 1024x1024 clipmaps each fed from its own 16384x16384 texture). This revealed that a) the deserialization queue fills up quickly (several hundred pending requests) and that b) the ADIO system leaks tremendous amounts of memory (the process size climbs with a rate of probably around 5 MB/sec).

Of course, actual data sets will never get to anywhere near that amount of I/O activity but the problems are there. In my opinion, Atlas must be able to cancel a request at any point in the processing pipeline not only at the read-from-disk stage. While each system will have its limit at which point Atlas simply will not be able to keep up (I am working on a laptop with a crappy disk as well), this will at least allow the system to gracefully handle contention problems and will cause the clip map to catch up quickly.

I have fixed the memory leak and am now working on the cancellation thing which is a bit tricky because of threading issues. Hope to have that finished today or tomorrow at the latest.

So, all that remains to say is that Atlas is really, really cool. Great work, Ben! I think that very few people (outside of GG) realize what powerful system you have built there. You should have thrown around magic words like "megatexture" and "overhangs" and such.
#22
04/05/2008 (6:02 pm)
Hey Rene,

Ok great - glad that mipping got fixed. In theory all the systems (unique, blended on legacy, atlas) should all use identical conventions for indexing mips and such. Unfortunately it's easy to get off by one somewhere and not detect it. Not sure what the best way to make it easy to spot those sorts of problems is.

The hack fix for the clip level limit is to use a deeper tree depth for your geometry. This has a harsh cost in dataset size. The "real" fix for the clip level limit is to modify the GeomChunks to break up the geometry into multiple subsets in a single VB/IB. Then you can draw either the whole range or a segment(s) of it based on what the clipmap needs to stay at 4 levels or less. There is no duplication of data so it's basically "free." In my tests, drawing only a few levels of the clipmap was always quite a bit faster than running the "uber shader" version that samples all levels, even though it takes more DIPs and processing. Depends on the specifics of your game I suppose.

Have you tried plotting the queue lengths over time? I'd be very interested to see the correlation between camera speed, ADIO queue, deserialization queue, instatement queue, and allocated memory. It would probably be very easy to do this using a special print statement every frame and Excel/Google Spreadsheets. How fast is your camera moving? Do you think ADIO is leaking or just fragmenting memory?

Thanks for the kind words about Atlas! Definitely a huge learning experience for me. :) I think it delivered on its goals very well. Unfortunately those goals did not match up with the community-at-large's needs very well. It's a bit sad to me that it hasn't been used more heavily internally at GG - understandable, of course, given that most GG projects don't need massive terrain. ;)

I am happy to see the post-Atlas research work I did getting a little traction internally. It's a poly soup system that supports LOD, streaming, and editing. You would build all the static geometry for your world in it instead of just using it for terrain. Maybe it will surface in a future version of Torque someday!
#23
04/05/2008 (6:45 pm)
Hey Ben, really great you are taking the time. Very helpful as well.

-----------------

- Clip level

I think that going for the segmented approach was a good decision. It is only that the current code is rooting its selection on the wrong end. It takes the coarsest LOD and clips the finest LOD into range while my code takes the finest LOD and clips the coarsest LOD into range.

Maybe I will try that "uber shader" thing at one point.

- Plotted queues

That is definitely a good idea! I'd like to have these as bars on screen. Will do that once the system is working.

- Queue lengths

The camera is moving only a little faster than normal walk speed. What I see as the real problem is that stale I/O ops are allowed to pile up (cancelLoadRequest is dutyfully called by the update code but remains ineffective since the requests are already further down the pipeline). When Atlas is able to kick out any request at will, ever-growing queues will simply never show up. In that case, I/O load will be directly proportional to what's on your screen--the way it should be.

- Memory leak

It is a real leak. Resource TOCs leave buffer allocations to the ADIO system while the ADIO system never frees them, i.e. each ADIO request leaks one complete buffer block. Simple to fix, though.

- Atlas in general

From a personal viewpoint, I cannot see how someone with a system like Atlas on his/her hand can go back to legacy terrain. The possibilities of Atlas are so vast and extend far beyond just doing massive terrains. It may not be a good match for hobbyist game developers, but for professional developers with a proper content pipeline, Atlas can work wonders.

I really hope that I can give something to GG in the future that serves as a better presentation for Atlas than the (rather unimpressive) terrain_water_demo.

------------------
#24
04/05/2008 (7:43 pm)


Ahemm, shouldn't have been so quick with the memory leak thing. The memory consumption is caused by the I/O contention alone. The ADIO system *does* clean up properly.
#25
04/05/2008 (8:14 pm)
- Clip levels

Hmm - failing to clamp so that coarsest clipstack level is selected means it will render incorrectly as it will sample from the wrong level sometimes. That's why it drops fine detail levels in favor of coarse ones when they are needed.

- Memory leak

Heheh - whoops. Well that's good it's just the queue. I was wondering where the leak in ADIO was, it's a pretty simple system. :)

- Queue lengths

I wonder - do you maybe have some other threads active? Maybe the deserialization thread is being starved. Even with 512px tiles, unless you have a ludicrously high amount of texture detail, it shouldn't take much CPU to handle all the decompression...
#26
04/05/2008 (9:11 pm)
- Clip levels

That then proves I need to get acquainted more with the rendering part of Atlas. My problem was that when the total clip stack size exceeded that max delta, Atlas was never rendering the higher detail levels at all. Have to take a more thorough look.

- Memory leak

I feel like leaking some right now...

- Queue lengths

Nope. No other threads. Deserialization thread starved... Hmm... the last time I looked it was the fattest pig in town. The queue gets ever longer and the deserializer is working like crazy. It just occurs to me, though, that I haven't really made sure the deserializer is making any real progress. The fact that that you find this all quite unreasonable gets me thinking. Hope I'm really on the right track. Let's see.
#27
04/05/2008 (9:24 pm)
Keep your head up man--seriously. In my memory, you are the first person to seriously tackle Atlas in the 2+ years it's been available. It really is a well engineered system, and has the potential to do some -amazing- things, but as you've so well documented, it's amazingly complex too :)

As Ben mentioned, it's awesome that you've got the time, energy, and expertise to pound on it hard--it's not really been put through the ringer other than Ben's extensive self-QA, so I'm excited myself to see you engaged with it!
#28
04/05/2008 (9:32 pm)
Maybe something I should have already reported a while back (because it cements the I/O theory--you see, I am getting more cautious already :)

One can play cat and mouse with the clip map, i.e. when the camera rests the system will then indeed find its way back into normal state and eventually the clip map will be up to date again. Then, when outpacing the I/O system again, the deserialize queue will fill up again.

I couldn't find any fault with the requests issued by the image cache so I am assuming that the correct data is requested.
#29
04/05/2008 (9:38 pm)
@Stephen:

Only saw your post after writing that last one.

You are right, it is complex. But definitely worth the effort! Will absolutely stick to it and after spamming this board with my musings finally hope to be able to post some real stuff real soon now.
#30
04/05/2008 (9:43 pm)
- Clip Levels

Yea - that's the problem, the geometry chunks have to be at a certain size relative to the texture data else some texture mip levels are never shown at all. So you have to subdivide which can be wasteful, or you implement an additional subdivision inside the chunk so you can draw more finely and get full detail... the former is much easier than the latter. You can also increase the size of the clipmap textures which will reduce the number of clipstack levels which will mean more detail with fewer details. 2048px might be a safe number to try.

- Deserializer Thread

Yeah, I definitely think it's weird that it's acting the way you describe... but clearly _something_ is going wrong so I can hardly sit here and say "oh well it works great!" :P Thus all my questions about CPU starvation or strange queue behavior etc. It is good it eventually catches up. Is there a sleep in the deserializer thread? You might try removing it if there is one to see if that helps. Also dumping lengths of queues every frame to the console would be a very good idea whether or not you plot it in anything... You might also try adding some logic so the disk IO thread doesn't submit any more stuff for deserialization if the deserialize queue is over N items (like 5 or 10) in length, just sits and waits till the number goes down. That will at least keep the system from blowing out available memory.
#31
04/05/2008 (10:22 pm)
- Clip Levels

I'm slowly getting it. Never have given much thought to tile size. Valuable advice.

- Deserialization

Don't you think that what you suggested just leads to the deserializer blocking the loader, too. The deserializer queue *is* contented, so the loader would just block. Will take a more thorough look, though, whether the deserializer makes any real progress (the potential sleeping problem you mention).

Here is a run-down of what I think happens in detail:

1. setInterestCenter requests a rectangle matching the clip map (that's another thing in the current sources, I think; it currently requests twice that)//EDIT: this is fine
2. the requests are noted and the I/O system gets to work
3. the camera moves
4. setInterestCenter issues new requests for patches that are only in the new interest region and issues cancellations for patches that are only in the old interest region
//CORRECTION: of course, it will issue requests for all patches in the new region, but anything but the delta just adds heat
5. the I/O system receives the cancellations but by now, the requests have reached the deserializer and thus the cancellations will be more or less NOPs; the deserializer will continue to process the requests and they will thus resurface later on the processing queue

This repeats forever and while initially there will be only very few items on the deserializer queue that in fact should be cancelled, these will represent an increasingly large portion of the deserialized data until finally all data the deserializer is outputting is in fact no longer of interest. All the time, the camera flys on adding ever more requests.

And here is what I think *should* happen:

1.-4. is the same
5. the I/O system receives the cancellations and kicks out the requests wherever they are on the queues

The deserializer will still process something that isn't needed anymore at times but just because cancellation has to wait until the resource becomes available for deallocation. Otherwise the system remains clean.
#32
04/05/2008 (10:30 pm)
PS: If I currently leave the impression that I am slow to pick up some of your suggestions, Ben, it is because I had a large iteration (too large) of changes and am now frantically trying to get into a clean state again in order to be able to continue testing.
#33
04/05/2008 (10:56 pm)
Hey Rene,

The clipmap stuff is kinda tricky, and there are a lot of variables that tie into performance. Most of the tweakable parameters in Atlas should be replaced with ideal constants; it was helpful to tweak in the early phases but now it just makes life difficult for people, since it gives them a hundred knobs instead of the three they care about (if that many). I wrote an article on the clipmap stuff in Game Programming Gems 7 if you're curious about that.

Re: deserializer blocking the loader - well, if the queue is getting so long for the deserialize, wouldn't capping it (and blocking the loader) be a good thing? Certainly better than blowing megs on potentially stale data.

What sort of framerate are you getting? It sounds as if you're seeing many (dozens?) of IO requests getting through to deserialization in the time between one load request update and the next. One discarded deserialize every once in a while wouldn't be a big deal (which is why the system was designed this way among other reasons). Even one or two a frame probably would be ok if you had a high thoroughput of "good" chunks. But it sounds like you're seeing a LARGE volume of requests going through in the time between frames...

You might also want to get an idea of how frequently the queues are being touched by the deserializer/io threads. If you're frequently updating them (like locking and unlocking them a hundred times in order to cancel load requests) the cost of contention between the main and secondary threads might be pretty high. This is easy to measure - just put a profiler block around the lock/unlock mutex calls. If this is a source of contention it might be smarter to find a way to replace a ton of little locks with one big one once a frame or so - maybe doublebuffer the queues and check the "finished" list once right before you swap them.

Killing stuff in the IO queue is a good idea - just make sure to measure the cost of doing so so you can be sure it's a win!

Re: speed of response, don't worry about it, it's very easy for me to sit here and think up intelligent sounding things to try, you're the one who's doing the real work of writing and debugging code, and you're volunteering to do so too, so I'm not gonna hold speed against you. :)
#34
04/05/2008 (11:43 pm)
Hey Ben,

I would love to read that article you wrote. Unfortunately, as a student, I am sitting on kind of a tight budget.

Re capping: Don't think that blocking would actually result in the effect you describe. All that would happen is that the outdated requests pile up in the loader queue (or wherever you decide to block them) rather than in the deserializer queue. You *HAVE* to cancel out requests or you *WILL* flood the system whenever requests exceed load capability.

Wait.... You are right!! What I didn't pay attention to was that if you block the loader, requests will get stuck in the first pipeline stage where they are still cancellable. That's an easy solution.

Now I feel like shit because my shiny new cancellable IO is so utterly superfluous...

Framerates are stable and okay. I am getting the same low but level 30fps that I get in other Torque demos (on a crappy X1300). That's probably because of the dual cores and the main thread always having a CPU when it needs.

Yes, the volume is large, but so far I had concluded (since I could find no fault with request calculations) that all this was valid. Hmm, somehow I increasingly begin to think that the real issue may be somewhere else entirely.

Very good thoughts on locking overhead. That indeed may turn into a problem with my current system. Another reason, why your block-loader solution is the attractive one. Think I'll be going to kick my cancellation stuff but keep the new queuing code (which avoids lots of memory allocations and copying).
#35
04/05/2008 (11:54 pm)
I think you can get a pretty good student discount on it. Maybe check with your school's bookstore. Worst case maybe I can ship you one of my copies. (They sent me a box full.)

If you have lots of stuff in the scene or many tiles maybe the requests are valid... but good to check!

What all did you do for the new queueing system? How does it compare to the old system?
#36
04/06/2008 (1:29 am)
So, after a good delay now (mostly caused by the most complex bug ever produced... Vista).

Re book: you don't usually get student discount on books in Germany and English books are way expensive :(

Have only just begun getting down to business with the updated codebase. Can't say much yet.

The new queuing system pretty much is just a consolidation of your code (except for the cancellation stuff of course). It just cuts on lots of little overheads and greatly simplifies some processing in AtlasFile::syncThreads.

Have to wrap up sometime soon and get at least a few hours of sleep. Still determined to deliver something by tomorrow, though.
#37
04/06/2008 (1:37 am)
Rene,

I've been watching this and other threads you've been posting on this week and I, like Matt and everyone else in our group, *seriously* appreciate the effort. Atlas is a tough beast to tackle and you're doing some manly code work there.

I really hope we can bring you tighter into the development look and help us get Atlas where it should be. I want super pretty terrains with crazy visible distance and unique textures as much as anyone :) Again, great work. Don't hesitate to email any of us if you get stuck, need help, have an idea, or want to pitch us on a new project to update Atlas for the future.

Thanks,
Brett (bretts@garagegames.com)
#38
04/06/2008 (1:48 am)
Hey, you guys are so cool. I've already received support and encouragement from so many sides. Thank you, Brett, for your encouragement and your offer of help. Much appreciated.

Atlas is one of Torque's greatest assets in my oppinion. For pretty much everything else, Torque has lots of tough competition, but when it comes to doing what Atlas does, Torque pretty much stands in wide open space with maybe Quake Wars or so but nothing you can shed 250 bucks on and start hammering.
#39
04/06/2008 (2:52 am)
So, wrapping up now.

A quick round of testing with the current build revealed:

a) Ben was right about rendering issues coming from clamping to the finest LOD :)

b) IO cancellation is subject to tons of race conditions that are inherently difficult to plug. As things stand right now, I much favor Ben's simple and elegant block-the-loader solution.

c) Clip map recentering so far seems to work (hooray!!) albeit the speed at which data comes in seems a little disappointing to me. However, as the test dataset (I am already using a scaled-down version //but still one that had all the issues before//) already proved to suffer from high I/O contention, this may not be an issue with Atlas at all.

d) The new queueing code seems to work fine and may be a good pick for Atlas no matter whether the cancellation code makes it or not.

After a couple hours of sleep, I will tackle what will hopefully be the last 10-20% of getting this to work. I'll probably also rip out my cancellation stuff and go for the loader-block thing. Debugging threading issues is no fun.

PS: Ben, talking to you was really cool and extremely helpful (not to mention fun). Hope to see you around again.
#40
04/06/2008 (6:34 pm)
I'm sort of stuck today and in a fairly unusable condition, too. However, here's a short run-down of the current state of affairs:

- Ben's loader-blocking fix

I have added code to do the loader-blocking that Ben suggested. The code actually throttles the entire threaded part of the pipeline. As with the cancellable I/O, the result is that load problems in the pipeline completely disappear and that requests get cancelled correctly. This in short means that clipmap recentering now works. I am currently investigating whether there aren't other issues (besides the raycasting misses) that mess with this part of Atlas.

Here is the code for the loader-block (atlas/core/atlasFile.cpp@485 in AtlasFile::enqueueNextPendingLoad):

U32 deserializerLoad = mPendingDeserializeQueue.lockVector().size();
   mPendingDeserializeQueue.unlockVector();
   if( deserializerLoad > 5 && mFile.hasPendingIO() )
   {
	   PROFILE_END();
	   return;
   }

What this does is to not let any requests escape stage 1 (where requests are still being read out from the TOCs) while the threaded pipeline is busy (6 pending requests in serializer and anything in load stage).

- Atlas Debugging Control

I have decided to write a GUI control that allows monitoring what's going on inside Atlas. This should greatly help in debugging. One feature will be a rectangular rendering that shows the loading states of all of the tiles in a particular texture LOD.

- The other build

I have a separate build with more extensive changes that includes I/O cancellation. It is now up and running and actually the results (especially visually) are very good. However, the cancellation code still has bugs (haven't yet resolved to actually ripping it out).

Overall, I am not happy with I/O throughput as it is now. I will go into a more thorough testing phase. I also want to verify that my fixes to the detail level problems are really correct.

Not quite where I had hoped to get to this weekend, but that's how it goes.