AIPlayer Transform returning 1.#QNAN after SetMoveDestination
by Meredith F. Purk II · in Torque Game Engine · 03/18/2011 (7:26 am) · 25 replies
I'm having a (somewhat intermittent) problem with AIPlayers vanishing and returning garbage on their gettransform calls. The AIPlayer is using a standard Torque-ship model: I've tried with Kork and Mr Box with the same results. Terrain is flat, set to squareSize of 8. Player models are a scale of 1 1 1.
Example of the getTransform call:
An AIPlayer will occassionally vanish from the visible scene when receiving a SetMoveDestination call. Sometimes I can go dozens, even fifty tests without a hitch. Other times, an AIPlayer will exhibit this behavior 2 out of 3 times. At first I thought they were dropping below terrain, but once I checked transform and found that they were not listed in a legitimate location (nor could setTransform calls bring them back) I tried to puzzle out what was happening.
The synopsis is thus: I have a custom walkpath system where the object determines a destination in world space and creates a simple route on an 8x8 grid to that point, walking only orthogonal directions. To simply the walkroute, it only creates a node on bends/intersections, so that it will walk in straight lines where possible.
The nodes are simply ordered pairs plus getTerrainHeight() stored as variables on the AIplayer as thus:
The AI uses the onReachDestination call back to check its point on the path, increment to the next node using lastNode and currentNode to tell which portion of the path its working on and when its finished. When the AI players vanish, they seem to do so on the first SetMoveDestination call (i.e., to Node[0]) on the path. I don't believe I've witnessed a situation where they've vanished at other steps along the path.
The game used to lag tremendously when this happened...Using profiler I tracked that to a rezoning problem where the game would get stuck on SceneGraph_rezoneObject (using 90%+ processor time). Removing .difs in the scene seems to have stopped the rezone problem when the AI vanishes, but not the vanish problem itself.
My next step is to go back to a Debug build and try to dig deeper into this. Any suggestions where to start?
Example of the onReachDestination/walkroute function:
Example of the getTransform call:
==>echo(2679.gettransform()); 1.#QNAN 1.#QNAN 1.#QNAN 1.#QNAN 1.#QNAN 1.#QNAN 1.#QNAN
An AIPlayer will occassionally vanish from the visible scene when receiving a SetMoveDestination call. Sometimes I can go dozens, even fifty tests without a hitch. Other times, an AIPlayer will exhibit this behavior 2 out of 3 times. At first I thought they were dropping below terrain, but once I checked transform and found that they were not listed in a legitimate location (nor could setTransform calls bring them back) I tried to puzzle out what was happening.
The synopsis is thus: I have a custom walkpath system where the object determines a destination in world space and creates a simple route on an 8x8 grid to that point, walking only orthogonal directions. To simply the walkroute, it only creates a node on bends/intersections, so that it will walk in straight lines where possible.
The nodes are simply ordered pairs plus getTerrainHeight() stored as variables on the AIplayer as thus:
node0 = "116.5 -103 0.02" node1 = "111.5 -103 0.02"
The AI uses the onReachDestination call back to check its point on the path, increment to the next node using lastNode and currentNode to tell which portion of the path its working on and when its finished. When the AI players vanish, they seem to do so on the first SetMoveDestination call (i.e., to Node[0]) on the path. I don't believe I've witnessed a situation where they've vanished at other steps along the path.
The game used to lag tremendously when this happened...Using profiler I tracked that to a rezoning problem where the game would get stuck on SceneGraph_rezoneObject (using 90%+ processor time). Removing .difs in the scene seems to have stopped the rezone problem when the AI vanishes, but not the vanish problem itself.
My next step is to go back to a Debug build and try to dig deeper into this. Any suggestions where to start?
Example of the onReachDestination/walkroute function:
function AIEnemy::onReachDestination(%this,%obj)
{
if (%obj.followPath == 1)
{ // We're following a walkroute right now
if (%obj.lastNode == %obj.currentNode)
{
%obj.followPath = false; // We're done with the path
echo(%obj SPC "done following path: Stopped at" SPC %obj.getPosition() SPC "for node" SPC %obj.currentNode);
}
else if (%obj.lastNode > %obj.currentNode)
{
%obj.currentNode = %obj.currentNode+1;
%obj.setMoveDestination(%obj.node[%obj.currentNode], true);
echo(%obj SPC "following path: Moving to" SPC %obj.getMoveDestination() SPC "for node" SPC %obj.currentNode);
}
else
echo("AIEnemy::followWalkRoute error - returned invalid node");
}
}
#22
@Kane: I went ahead and added a sanity check to my first call of setMoveDestination. Before the command is issued, it checks the objects current getPosition() and the destiation nodes position like so:
Now to an example of the latest iteration of the problem.
So... Its decided to show itself in a debug build so long as I'm not actively debugging... maybe something is catching bad data somewhere? I dunno. However, this is an example scenario from the last occurance:
Okay, I have an 8x8 grid set up using a coordinate system as shown. My object is located at 2,5 (noted by addition sign +). Its destination is 3,2 (Noted by X). A, B, C, X is essentially his walkroute. To simplify movement to contiguous paths whereever possible, A and B are ignored and the first node in his path (node[0]) is set to C. His last node (node[1]) will then be X.
The bot's starting position is 119, -95.5, 0.02. The center of point C is 119, -103, 0.02 and point X is at 116.5, -103, 0.02. The bot correctly echoed his starting position, his destination node and that node's position. As soon as the setMoveDestination command was issued to move him to point C, he popped out of sight and getTransform/getPosition echos result with QNAN garbage.
(FYI, the vectorDist check did not throw any flag... performing it manually returns a result of 7.5 between 2,5 (point +) and 2,2 (point C). I'm going to add an echo that will show the result at every call just for my reference.
03/31/2011 (5:45 am)
(EDIT: To clean up display of my grid in the code box)@Kane: I went ahead and added a sanity check to my first call of setMoveDestination. Before the command is issued, it checks the objects current getPosition() and the destiation nodes position like so:
if (vectorDist(%this.getPosition(), %this.node[0]) < 0.5)
{
echo("Error! Current position and MoveDestination are identical! Breaking out!");
%this.followPath = false;
return;
}Now to an example of the latest iteration of the problem.
So... Its decided to show itself in a debug build so long as I'm not actively debugging... maybe something is catching bad data somewhere? I dunno. However, this is an example scenario from the last occurance:
. _ _ _ _ _ _ _ _(y) |_|_|_|_|_|_|_|_|7 |_|_|_|_|_|_|_|_|6 |_|_|_|_|_|+|_|_|5 |_|_|_|_|_|A|_|_|4 |_|_|_|_|_|B|_|_|3 |_|_|_|_|X|C|_|_|2 |_|_|_|_|_|_|_|_|1 |_|_|_|_|_|_|_|_|0 (x)7 6 5 4 3 2 1 0
Okay, I have an 8x8 grid set up using a coordinate system as shown. My object is located at 2,5 (noted by addition sign +). Its destination is 3,2 (Noted by X). A, B, C, X is essentially his walkroute. To simplify movement to contiguous paths whereever possible, A and B are ignored and the first node in his path (node[0]) is set to C. His last node (node[1]) will then be X.
The bot's starting position is 119, -95.5, 0.02. The center of point C is 119, -103, 0.02 and point X is at 116.5, -103, 0.02. The bot correctly echoed his starting position, his destination node and that node's position. As soon as the setMoveDestination command was issued to move him to point C, he popped out of sight and getTransform/getPosition echos result with QNAN garbage.
(FYI, the vectorDist check did not throw any flag... performing it manually returns a result of 7.5 between 2,5 (point +) and 2,2 (point C). I'm going to add an echo that will show the result at every call just for my reference.
#23
I've revisited AiPlayer.cc and the code in that file seem to address
the chance of corrupting the matrix to QNAN. Typically, QNAN problem with AI control is created like this:
1) vect diff = goal - current pos
2) forward vector = normalize(vect diff) --> this results in error when the diff is (0,0,0), attempt to normalize it will result in QNAN because the determinate will be 0
3) construct matrix based on forward vector --> forward vector is already QNAN, hence matrix will also be QNAN
Disregard the possibility of this happening in AiPlayer.cc. They covered the chance for this to not happen. Not to mention their computation flattens out the forward vector to 2D by throwing out the z component.
Your algorithm to skip A and B nodes right to C shouldn't be an issue.
I've encountered this problem before, though I can't remember the exact details. But I think it had to do with AI's matrix (can't remember if it was a bad worldmatrix data propagating down to render matrix data, or from uninit position, etc.) not being initialized properly on the server side and re-initialize the client side with un-init data.
Put debug print in all areas that set/get position/transform AI on the server and client side code and move destination and see what corrupts (re-inits) the matrix with uninit data. First, check the
function GameConnection::createPlayer(%this, %spawnPoint) in
server\script\game.cs file
and make sure %spawnPoint in
%player.setTransform(%spawnPoint);
is a valid transform.
GL.
04/13/2011 (1:01 pm)
I haven't been keeping up on this thread and didn't realize you made additional comment until now.I've revisited AiPlayer.cc and the code in that file seem to address
the chance of corrupting the matrix to QNAN. Typically, QNAN problem with AI control is created like this:
1) vect diff = goal - current pos
2) forward vector = normalize(vect diff) --> this results in error when the diff is (0,0,0), attempt to normalize it will result in QNAN because the determinate will be 0
3) construct matrix based on forward vector --> forward vector is already QNAN, hence matrix will also be QNAN
Disregard the possibility of this happening in AiPlayer.cc. They covered the chance for this to not happen. Not to mention their computation flattens out the forward vector to 2D by throwing out the z component.
Your algorithm to skip A and B nodes right to C shouldn't be an issue.
I've encountered this problem before, though I can't remember the exact details. But I think it had to do with AI's matrix (can't remember if it was a bad worldmatrix data propagating down to render matrix data, or from uninit position, etc.) not being initialized properly on the server side and re-initialize the client side with un-init data.
Put debug print in all areas that set/get position/transform AI on the server and client side code and move destination and see what corrupts (re-inits) the matrix with uninit data. First, check the
function GameConnection::createPlayer(%this, %spawnPoint) in
server\script\game.cs file
and make sure %spawnPoint in
%player.setTransform(%spawnPoint);
is a valid transform.
GL.
#24
Again, no reason to believe this is the cause of your problem, but it is the type of issue what could lead to this kind of problem.
... If you're still trying to debug this.. my suggestion at this point would be to start adding Con::printf statements at the start and end of key functions like Player::processTick. Try to identify what function is resulting in the corrupt data. Then start placing printfs within that function to narrow down where it's happening. It's not fun but if you can't catch it immediately with an Assert and break to run a backtrace.. it may be the only way to track it down.
04/14/2011 (3:43 pm)
I've no reason to believe this would solve your problem, but I just happened to notice that createOrientFromDir() in engine/math/mathUtils.cc does not normlize the direction passed to it. So if it is given an un-normal direction it will produce a bad matrix. I suggest adding j.normalize() before the first mCross() call just in case. Again, no reason to believe this is the cause of your problem, but it is the type of issue what could lead to this kind of problem.
... If you're still trying to debug this.. my suggestion at this point would be to start adding Con::printf statements at the start and end of key functions like Player::processTick. Try to identify what function is resulting in the corrupt data. Then start placing printfs within that function to narrow down where it's happening. It's not fun but if you can't catch it immediately with an Assert and break to run a backtrace.. it may be the only way to track it down.
#25
I did encounter a rogue .setPosition() call in one of my scripts and removed it... It was residue left from a previous iteration and must have been missed when cleaning up code. Essentially, with what was remaining of it, it was setting the position of the object to its current position... but it wasn't performing this on all AIPlayer objects nor was it being called specifically by my walkroute functions....
Still, after removing it in a passthrough, I happened to notice the problem had vanished a couple build's later. So either something cleaned up in the last compile or that Torquescript call had a hand in corrupting the matrix.
Regardless its been about 10-15 builds later and I still haven't encountered that problem again. If it shows up I'll continue from where I left off in trying to trace it down, otherwise - thank you all again for your help!
05/15/2011 (9:08 am)
Well, an (overdue) update: As of early May I am no longer encountering this problem. How/what is involved in 'fixing' it I'm not sure.I did encounter a rogue .setPosition() call in one of my scripts and removed it... It was residue left from a previous iteration and must have been missed when cleaning up code. Essentially, with what was remaining of it, it was setting the position of the object to its current position... but it wasn't performing this on all AIPlayer objects nor was it being called specifically by my walkroute functions....
Still, after removing it in a passthrough, I happened to notice the problem had vanished a couple build's later. So either something cleaned up in the last compile or that Torquescript call had a hand in corrupting the matrix.
Regardless its been about 10-15 builds later and I still haven't encountered that problem again. If it shows up I'll continue from where I left off in trying to trace it down, otherwise - thank you all again for your help!
Torque 3D Owner Meredith F. Purk II
Default Studio Name
If I'm understanding you, you are saying that if an object is resting exactly on its goal and attempts to check where it needs to move next it could throw a QNAN? Because when I've seen this happen I've seen it the moment the object receives the order to move to its new destination which is never the position it was just at.
However, it is worth saying that I'm still trying to get it to create itself in Debug again... which has proven to be more difficult than in release build.