Game Development Community

checksum mismatch problem

by kcpdad · in Torque Game Engine · 07/28/2010 (2:45 am) · 6 replies

I'm seeing this situation in my prototype with a single client connecting to a dedicated server.

The control object is created and its processtick starts to run on both client and server.

1 - The client processes some moves and sends them to the server along with a checksum of the control object selected state variables. This is the first time it is sending the checksum with the move. Call it Packet A.

2 - The server at the same time processes some moves and sends a correction update to the client because mLastControlObjectChecksum is 0. It is 0 because the client's first control object checksum hasn't arrived yet. Call the correction packet Packet B.

3 - The client gets the correction(Packet B) and changes the control object's state and the connection move id pointer, mLastClientMove, so the client has to reprocess the moves since the correction.

4 - The server in the meantime gets Packet A with clients moves and the checksum, it processes these moves and compares the checksums. In this case they are equal. So the server sends an acknowledgment to the client. That is Packet C.

5 - The client gets Packet C and resets the move pointer forward, mLastClientMove, in time so it bypasses the moves we need to execute on the corrected control object.

6 - The client processes a new set of moves on the control object, the state is still set at the earlier correction, step 3, but there is a gap in the set of moves needed to keep in sync with the server because of the popping done in step 5.

7 - The client sends the moves and a new checksum to the server(Packet D).

8 - The server gets the moves(Packet D) processes them then compares checksums, they are different so it sends an unnecessary correction.

It seems to me that step 5 is a error because the client would need to always(?) execute all moves after a correction until it receives another correction.

is this a known limitation or am I missing something? thanks

About the author

Hobbyist working on a tank game when time allows. Play the prototype at => http://www.sytrept.com/60tons/


#1
07/28/2010 (8:25 pm)
What you might be missing is that right after step 3 all moves up to the latest are reapplied immediately on the control object on the client. That way when the client is 10 moves ahead of the server and the server corrects a move the other 9 are applied again to the client's control object.

It's because of this that processTick is not always guaranteed to be called at 32ms ticks for control objects. If the server corrects, the control object could tick again 30 or so times in a single actual tick. Learned that the hard way when I tried to keep a timer synced with processTick in the Player class.

After this all future moves are hopefully in their correct positions so once these get to the server it will stop correcting. Although the server will still be acting on stale packets so you will get a few unneeded corrections but it smooths out over a few packets.
#2
07/28/2010 (9:12 pm)
In case you want to see where this happens it's this code on the else case of if(tickCount) in ProcessList::advanceClientTime in gameProcess.cc. Due to it being in the else case of tickCount it happens immediately right after the server has corrected the client's control object.

if (control)
      {
         // Sync up the control object with the latest client moves.
         Move* movePtr;
         U32 m = 0, numMoves;
         connection->getMoveList(&movePtr, &numMoves);
         while (m < numMoves)
         {
            control->processTick(&movePtr[m++]);
         }
         connection->clearMoves(m);
      }
#3
07/30/2010 (1:13 am)
Thanks for the tip Wes


The first correction to the client results in advanceClientTime() being called as you note.

PKLOG 1651 readPacket() START
PKLOG 1651     READ mLastMoveAck 386
PKLOG 1651     mLastMoveAck 386 > mLastClientMove 367
PKLOG 1651     mFirstMoveIndex = 367, mLastClientMove = 386
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 368
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 369
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 370
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 371
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 372
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 373
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 374
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 375
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 376
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 377
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 378
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 379
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 380
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 381
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 382
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 383
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 384
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 385
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 386
PKLOG 1651     CONTROL OBJECT DIRTY CALL mLastMoveAck: 386
PKLOG 1651     mLastClientMove = 386
readPacketData mismatch energy 0.000000 server says 200.000000
readPacketData mismatch mRechargeRate 0.000000 server says 0.050000
readPacketData()
readPacketData mismatch linPosition 810.000000 30.000000 159.863647 server says 810.000000 30.000000 155.255661
readPacketData mismatch linMomentum 0.000000 0.000000 -13132.789063 server says 0.000000 0.000000 -14105.585938
*** Initial Control Object
Activating DirectInput...
keyboard0 input device acquired.
PKLOG 1651 readPacket() END
advanceClientTime(4) mLastTime = 24317 targetTime = 24321 targetTick = 24352 tickCount = 1
interpolateTick(0.000000)
PKLOG 1651 pushMove: id 387
| advanceObjects() numMoves = 2
/n/nprocessTick() Start client mTickCount 21 move 07F7F100
    move is not null move id 386, xyz 0.000000 0.000000 0.000000
updateForces(dt) dt = 0.016000
updateforces() linMomentum 0.000000 0.000000 -14105.585938
updateforces() drag forces 0.000000 0.000000 0.000000 4
updateforces() gravity     0.000000 0.000000 -7600.000000 4
updateforces() linMomentum 0.000000 0.000000 -14227.185547
updateForces(dt) dt = 0.016000
etc...

But after the next correction I don't see a call to advanceClientTime between calls to readPacket(). Note the last read moves the move counter forward but processtick() is not called on the object.


PKLOG 1651 readPacket() START
PKLOG 1651     READ mLastMoveAck 387
PKLOG 1651     mFirstMoveIndex = 386, mLastClientMove = 388
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 387
PKLOG 1651     CONTROL OBJECT DIRTY CALL mLastMoveAck: 387
PKLOG 1651     mLastClientMove = 387
readPacketData()
readPacketData mismatch linPosition 810.000000 30.000000 152.828781 server says 810.000000 30.000000 154.052460
readPacketData mismatch linMomentum 0.000000 0.000000 -14591.984375 server says 0.000000 0.000000 -14348.785156
PKLOG 1651 readPacket() END


PKLOG 1651 readPacket() START
PKLOG 1651     READ mLastMoveAck 387
PKLOG 1651     mFirstMoveIndex = 387, mLastClientMove = 387
PKLOG 1651     CONTROL OBJECT DIRTY CALL mLastMoveAck: 387
PKLOG 1651     mLastClientMove = 387
readPacketData()
PKLOG 1651 readPacket() END


PKLOG 1651 readPacket() START
PKLOG 1651     READ mLastMoveAck 388
PKLOG 1651     mLastMoveAck 388 > mLastClientMove 387
PKLOG 1651     mFirstMoveIndex = 387, mLastClientMove = 388
PKLOG 1651     mMoveList.pop_front() mFirstMoveIndex 388
PKLOG 1651 readPacket() END
Should advanceClientTime() be called between readPacket() calls?

thanks
#4
08/04/2010 (10:54 am)
Update: this problem may have been caused by all the tracing I added because when I disable the tracing it doesn't happen... I think.
When I test with multiplayers over the internet I'm still getting more corrections than I should be getting but I don't know if it is caused by this or something else.
#5
08/04/2010 (9:52 pm)
The first dump you show does look good.

As for the second one, the only reason I would think you would get many readPacket calls is that the packets were queued on the net interface. I'm guessing you're delaying the packets somehow? Otherwise the server should be sending them once every 32ms at the least but should actually be higher.

One thing that looks odd to me is that the first packet and the second packet both Ack 387 and the control object is dirty. Then somehow the server decided that the client was correct with the next move at 388. I'm not sure how that could be unless the client corrected its control object and sent the new move 388 with a corrected position. Or the server is changing a value that eventually aligned with what the client sent. Do you have TORQUE_DEBUG_NET defined? That might help get even more information. It appears that something is missing here.

When you say you're getting more corrections than you should be, what exactly do you mean? The reason I ask is because for every one correction you actually get multiple corrections from the server until the client transmits a new move in the proper position. Based on client and server packet rates and latency you could see 4-5 or more corrections from the server before it is acking back corrected moves.
#6
08/06/2010 (11:24 am)
The timing of this may have been affected by all the tracing I added.

The server thinks the client was correct with move 388 because the client prediction was correct but it was received by the server only after it sent its initial control object correction. The server after sending the correction processes the next move and lo and behold it matches the checksum sent by the client, so it sends the ack. Yes I have TORQUE_DEBUG_NET defined and have added even more tracing.

There are plenty of places in the code that cause unnecessary corrections. Some seem to be stock TGE, some are problems I introduced.
For example I use energy weapons and that was causing a control object update every time I fired a weapon because mEnergy is part of the calculated checksum.
Another case I created triggers that would change mRechargeRate so that every time someone crossed a trigger boundary their control object was corrected because mRechargeRate is considered in the checksum.
Yet another case was a center turret command I wrote, the logic ran only on the server so it caused a correction whenever executed. I didn't notice it an the time but only when I started to analyze the network traffic.

As you probably know these corrections are expensive and usually noticeable by the player because of position changes. It's amazing how much smoother the game is when you eliminate them :D