RTSAttackEvent::pack() -- possible to cause issues
by Stephen Zepp · in RTS Starter Kit · 12/14/2004 (9:24 am) · 6 replies
It's hard to call this a bug, since it can't happen in the stock RTS-SK (as far as I am aware), but I wanted to point it out to anyone in case they come up with something similar.
First, some background:
The execution sequence for calling RTSConnection.sendAttackEvent (via script, server side) has an assumption that you will always have a selection group, instead of a single unit. This is not in any way a bad assumption, especially since the RTS SK also assumes that attack commands always come from the player, across a commandToServer scenario. This assumption carries all the way down into the RTSAttackEvent::pack/unpack level, which is rather low level, and something that people may not want to mess with too much.
The issue we had however is that we wanted to have some form of unit auto-attack, where they would detect a close by enemy, and then issue an attack command. Our model at this stage was to have individual units (server side) look for closeby (and visible) enemies, and then issue a single unit attack on any enemies that were found.
We wanted to make sure clients were aware of this attack, so we basically gave each unit it's own SimSet, called selectionWrapper, which only contained the unit itself. This was populated on unit creation. It works like a champ for most cases, the combatAIProcessTick() did all it was intended to do.
However, we found a nasty server crash whenever you had large numbers of units in a fight, and after tracing it down exhaustively for more than a week, we decided that the root problem is in the timing of posting the RTSAttackEvent, and actually delivering it.
Either we have a bug in our target selection script (semi-doubtful, we've gone through all sorts of checks on both the attacker and victim objects to make sure they are legal objects, Enabled, still around, not a corpse, etc.), or the following can occur:
When a large amount of units are all doing damage to a single unit, it is possible for the unit to "die" in a single tick. Since our unit also counter-attacks when it is attacked (using the above technique with the selectionWrapper to initiate the attack), it immediately posts an RTSAttackEvent(), and then proceeds to die during the very same processTick(). In other words, when the event is posted, every associated object in the event is valid, but when the event is actually packed up, one or more of the objects inside may no longer be valid.
In a debug build, RTSAttackEvent::pack() catches this this with the following line:
AssertFatal(dynamic_cast((*selection)[k]), "Non-NetObject object in move event!");
Unfortunately, that will obviously shut down your server!
(cont next message)
First, some background:
The execution sequence for calling RTSConnection.sendAttackEvent (via script, server side) has an assumption that you will always have a selection group, instead of a single unit. This is not in any way a bad assumption, especially since the RTS SK also assumes that attack commands always come from the player, across a commandToServer scenario. This assumption carries all the way down into the RTSAttackEvent::pack/unpack level, which is rather low level, and something that people may not want to mess with too much.
The issue we had however is that we wanted to have some form of unit auto-attack, where they would detect a close by enemy, and then issue an attack command. Our model at this stage was to have individual units (server side) look for closeby (and visible) enemies, and then issue a single unit attack on any enemies that were found.
We wanted to make sure clients were aware of this attack, so we basically gave each unit it's own SimSet, called selectionWrapper, which only contained the unit itself. This was populated on unit creation. It works like a champ for most cases, the combatAIProcessTick() did all it was intended to do.
However, we found a nasty server crash whenever you had large numbers of units in a fight, and after tracing it down exhaustively for more than a week, we decided that the root problem is in the timing of posting the RTSAttackEvent, and actually delivering it.
Either we have a bug in our target selection script (semi-doubtful, we've gone through all sorts of checks on both the attacker and victim objects to make sure they are legal objects, Enabled, still around, not a corpse, etc.), or the following can occur:
When a large amount of units are all doing damage to a single unit, it is possible for the unit to "die" in a single tick. Since our unit also counter-attacks when it is attacked (using the above technique with the selectionWrapper to initiate the attack), it immediately posts an RTSAttackEvent(), and then proceeds to die during the very same processTick(). In other words, when the event is posted, every associated object in the event is valid, but when the event is actually packed up, one or more of the objects inside may no longer be valid.
In a debug build, RTSAttackEvent::pack() catches this this with the following line:
AssertFatal(dynamic_cast
Unfortunately, that will obviously shut down your server!
(cont next message)
#2
12/14/2004 (4:01 pm)
Thanks Stephen for pointing this out. I'm going down the same road with individual units capable of emmiting an attack. This will come in handy.
#3
12/14/2004 (5:47 pm)
FYI, it's still not completely robust--getting some crashes even with this modification. Still working on it, but if anyone else is close to something like this, would love to join up with you on trying to figure out how to solve it!
#4
FYI, the root cause of the crash it appears is the fact that when the event is created, it stores pointers to the current selection set as well as the current victim, and if either of those objects that are pointed to are deleted between the event posting and actual delivery, they are invalid at the pack() execution point.
A big disadvantage of this is that in very large battles, you are talking about a LOT of AttackEvents being generated and broadcast. And ironically, due to the nature of those very large battles (units die pretty quickly) most of the time by the time the attack event is created, sent, and then processed client side, the unit is already dead, making the event completely stale by the time the client gets it.
I'm starting to lean towards the opposite direction: making an RTSAttackEventSummary event, which contains a summary of all the attack events that occured in the last tick. This would not do much for bandwidth in the event of small combats/single attacks, but should optimize large battles, where the attack events in many cases are stale anyway (the clients already know they are in a fight I would assume).
12/15/2004 (4:11 am)
Greg and I were brainstorming, and the most logical solution to the issue above is probably to just go ahead and implement an RTSSingleUnitAttackEvent, and make sure that instead of storing a pointer to the unit, store the unit itself in a SimSet that is created on the send side, and deleted once the event has been posted.FYI, the root cause of the crash it appears is the fact that when the event is created, it stores pointers to the current selection set as well as the current victim, and if either of those objects that are pointed to are deleted between the event posting and actual delivery, they are invalid at the pack() execution point.
A big disadvantage of this is that in very large battles, you are talking about a LOT of AttackEvents being generated and broadcast. And ironically, due to the nature of those very large battles (units die pretty quickly) most of the time by the time the attack event is created, sent, and then processed client side, the unit is already dead, making the event completely stale by the time the client gets it.
I'm starting to lean towards the opposite direction: making an RTSAttackEventSummary event, which contains a summary of all the attack events that occured in the last tick. This would not do much for bandwidth in the event of small combats/single attacks, but should optimize large battles, where the attack events in many cases are stale anyway (the clients already know they are in a fight I would assume).
#5
12/15/2004 (4:42 am)
Sounds logical to me.
#6
This has (obviously) fixed the crash itself, and has had no apparent unexpected effects. Expected effects are no notification to the clients of the attacks occuring, as well as not setting the client's version of aimObject for the enemy (or friendly) units, but for our purposes this is fine since all of our combat AI is server side.
It may have consequences down the road, but our refactor of the implementation for large scale combats will be cleaning that up in any case.
Short Answer: If you are getting a crash due to the scenario above, and decide that it's ok for your current task/goal to not send RTSAttackEvents, it shouldn't have a huge impact on the clients. Just make sure you keep in mind that they aren't being modified client side.
With the script call disabled, I ran several 150x150 fully automated combats, no crashes.
Performance was very acceptable server side, my Win98 box (1.5G cpu/384meg ram, GeForce 5600 Ultra) had a bit of a problem handling that large of a fight, but my WinXP box (3.2G/1gig ram, Geforce 6800 GT) was very acceptable in performance and load.
This was with a semi-optimized (and re-written) visManager, and of course the combat AI processing which occurs every tick. Both could be optimized much more strongly.
12/15/2004 (5:34 am)
Well, we punted for this milestone. We're in second test phase, and re-implementing the network traffic via one (or more) of the above techniques is scheduled for a later milestone anyway, so I went ahead and disabled creating new RTSAttackEvents for computer generated attacks. (I simply do not call the cl.sendAttackEvent from script).This has (obviously) fixed the crash itself, and has had no apparent unexpected effects. Expected effects are no notification to the clients of the attacks occuring, as well as not setting the client's version of aimObject for the enemy (or friendly) units, but for our purposes this is fine since all of our combat AI is server side.
It may have consequences down the road, but our refactor of the implementation for large scale combats will be cleaning that up in any case.
Short Answer: If you are getting a crash due to the scenario above, and decide that it's ok for your current task/goal to not send RTSAttackEvents, it shouldn't have a huge impact on the clients. Just make sure you keep in mind that they aren't being modified client side.
With the script call disabled, I ran several 150x150 fully automated combats, no crashes.
Performance was very acceptable server side, my Win98 box (1.5G cpu/384meg ram, GeForce 5600 Ultra) had a bit of a problem handling that large of a fight, but my WinXP box (3.2G/1gig ram, Geforce 6800 GT) was very acceptable in performance and load.
This was with a semi-optimized (and re-written) visManager, and of course the combat AI processing which occurs every tick. Both could be optimized much more strongly.
Torque 3D Owner Stephen Zepp
void pack(NetConnection *con, BitStream *bstream) { // SRZ: It appears that it is possible for either a victim or an attacker (object within // SimSet selection) is killed between the time the netEvent gets posted, and the time // it gets packed up for delivery. This may be causing our crash on the server. // let's hack-protect the object referencing and see if it stops the crash. if (con->getGhostIndex(victim) < 0) { // not a good thing, abort return; } bstream->writeInt(con->getGhostIndex(victim),NetConnection::GhostIdBitSize); S32 numUnits = 0; for (S32 k = 0; k < selection->size(); k++) { // AssertFatal(dynamic_cast<NetObject*>((*selection)[k]), "Non-NetObject object in move event!"); if (dynamic_cast<NetObject*>((*selection)[k]) ) { NetObject* obj = (NetObject*)((*selection)[k]); if (con->getGhostIndex(obj) != -1) numUnits++; } else { Con::errorf("RTSAttackEvent::pack() -- Non-NetObject in attack event!"); } } bstream->write(numUnits); for (S32 k = 0; k < selection->size(); k++) { if (dynamic_cast<NetObject*>((*selection)[k]) ) { NetObject* obj = (NetObject*)((*selection)[k]); S32 id = con->getGhostIndex(obj); if (id != -1) bstream->writeInt(id, NetConnection::GhostIdBitSize); } } }Again, please realise this is a pretty esoteric scenario, and almost certainly not a "poor implementation" in the stock RTS-SK, we're not saying that. However, if others wind up going down the same path as we did, you will probably run into it.