Game Development Community

Lockmutex crash on Atlas after trying to load mission twice

by Juan Aramburu · in Torque Game Engine Advanced · 09/05/2006 (8:22 pm) · 10 replies

1) Multiplayer server; client=server; Manually call disconnect() in the console, back to main menu, no loader threads running. Start new game, no problem.

2) Multiplayer server; client=server; Server sends a commandToClient(), and the client calls disconnect() within script; back to main menu, still a loader thread running. Start a new game, crashes when trying to acquire mutex inside the loader (the mutex is invalid, it's not null, however).

3) Multiplayer server; client != server; same procedure as case 2, however, starting a new game does not crash

Is there something I need to do to get it not to crash? We have four Atlas instances (named Terrain,TerrainB,TerrainC) in this particular mission.

#1
09/05/2006 (9:21 pm)
Can you give a line and file where it hangs, and the call stack? Doing so will allow me to figure out what's wrong and hopefully get you a fix. :)
#2
09/06/2006 (7:46 pm)
It crashes inside of EnterCriticalSection (called from winMutex.cpp/Mutex::lockMutex)

img325.imageshack.us/img325/6562/muthu1.jpg
The main difference is that if I call disconnect() (the script function) from the console, everything works fine; but if the server sends a commandToClient() to a client, who then then calls disconnect(), it crashes when loading a new mission.

The main difference I see is that when I call disconnect() from the console, there is no active loader thread when I'm brought back to the main menu. When disconnect() is called from within client scripts (after the server sends it its commandToClient), after being brought back to the main menu, there is still a loader thread running, which is odd, since MissionCleanup.delete() already supposedly deleted the AtlasInstances from the mission file.

PS: The four AtlasInstances are named Terrain,TerrainB,TerrainC,and TerrainD.

How we're instantiating them:

new AtlasInstance(Terrain[B|C|D]) {
         position = "-640 -640 49.6";
         rotation = "1 0 0 0";
         scale = "1 1 1";
         chunkFile = "~/data/terrains/terrain[B|C|D].chu";
         tqtFile = "~/data/terrains/terrain[B|C|D].tqt";
         materialName = "AtlasMaterial";
         detailTexture = "~/data/terrains/details/detail1";
};
#3
09/06/2006 (10:12 pm)
Hmmmmm.... Can you check the pointer to the atlas file that thread is working with to see if it has valid data in it?

You might have found a bug in how Torque handles object cleanup, or maybe it's just a race condition in Atlas. Either way, I would recommend putting disconnect in a schedule() call so it's deferred slightly - that should clean things up.
#4
09/07/2006 (11:38 am)
Calling disconnect(); immediately has always had a possibility of a crash in Torque, due to Ben's thought--in some unusual cases, there are objects that are destroyed immediately that are needed later in the update loop.

It's always preferrable to schedule a disconnect with a 0 second delay, which ensures that it allows the current update loop to complete before the disconnect occurs.
#5
09/07/2006 (3:40 pm)
Ben, if you're talking about the mStream variable, it seems to contain garbage, as most other members of the AtlasResource class. (Most members seem to have the value 0xCECECECE or 3469659854 in decimal). So it seems that the loader thread is still running after the AtlasResource has been killed. So it

When I used a schedule (even up to 5 seconds), the game still crashes, however, it is still playable; you just have to move the 'this program has encountered a problem' error messages out of the way, then you can keep playing the game like nothing happened. However, once I do the disconnnect() method (server_script->commandToClient->client_script->disconnect()) again, the game just exits out with no messages of any kind; just disappears.

img404.imageshack.us/img404/8576/errsck8.jpg
However, a fix that works is just to add mAtlasResource->stopLoader(); as the first statement of the AtlasInstance::onRemove() method...after trying it a couple of times, it works for the immediate problem that I started this thread for, but the third time I loaded a new mission, I got an error in GFXDevice::updateStates(), inside of the 'if' block that updates the primitive buffer.
#6
09/07/2006 (5:25 pm)
Hmm - that "harmless" crash is probably one of the threads dying. Of course, in release builds you'll get no useful debug information and since I don't have your debug symbols the info in the application error window is useless to me. :-/

Calling stopLoader the resource isn't the greatest, as it means if any other instances (like the server or client version of the terrain) reference it they won't be able to load anymore.

I'll have to sit down and debug this - hopefully will be able to make some time for that tomorrow. Thanks for the thorough reporting on it, though - makes it way easier for me to address it.

BTW - using multiple atlas instances for your terrain is not the best practice. It's better to use one atlas file if possible.
#7
09/08/2006 (2:27 pm)
About the moving the error window out of the way & still playing the game...it's the same exact error (the mutex seems to be garbage when we get to acquireLoadMutex() & crashes inside of EnterCriticalSection)

The reason we have multiple atlas instances for one mission is because we need to have separate terrain blocks all act independently of one another (with regards to gravity, orientation, etc...) and be physically disconnected, however still be playable in one mission.
#8
09/08/2006 (3:59 pm)
Oh - hehe.

You're using Atlas 1!

There is a total rewrite of the Atlas code that's significantly improved (especially on the no-crash side). You just need to switch to using the .atlas file format and the AtlasInstance2 class and you'll be set. It renders faster, looks better, and is more robust and flexible.

Atlas 1 code is deprecated and will be removed eventually.
#9
09/11/2006 (7:01 am)
Yeah, just haven't made the move to MS4 (or even MS3.5 for that matter). Merging is a beast.
#10
09/12/2006 (1:55 pm)
It's probably worthwhile - new Atlas runs faster, looks better, and has lots of bug fixes and improvements. No merging necessary - just import old files into new formats and off you go.

Of course, the rest of the engine might be a bit of a pain, but probably a lot less pain than doing all the fixing and improvement work that you're getting the benefit of. Get a good merge tool (Beyond Compare 2 is my fave) and go to town!