BUG? Cumulative Ticking Difference
by Matias Kiviniemi · in Torque X 2D · 05/03/2007 (12:49 am) · 26 replies
It seems that there's some very slight differences in the number of ticks you get over time in different machines. When I have the game running over network in two machines with same speeds, they slowly fall out of sync. The rate is quite slow, one tick in 1-2 seconds like a "fractional tick" would get ignored. The other machine is less powerful and runs at about 20fps, which might affect things.
I guess this is something that can't be totally eliminated, and is something one should prepare to handle. But in theory constant time ticks should converge over time, e.g. a game running at 10 ticks per second for a thousand seconds should get 10000 ticks and not for example 9500
I guess this is something that can't be totally eliminated, and is something one should prepare to handle. But in theory constant time ticks should converge over time, e.g. a game running at 10 ticks per second for a thousand seconds should get 10000 ticks and not for example 9500
About the author
#22
I should also clarify that when I said "the two computers will eventually fall out of sync", I actually meant that the clients still need to periodically resynchronize. But as you said this doesn't necessarily mean they "are falling out of sync", but it could also occur from a spike in network latency. And also the "200ms figure" was an estimate. Previously the pattern was easy to see as it happened often and always "in same direction". I'll have to study this some more and post back.
One thing I've noticed is that a spike in the performance (e.g. dropping to single digit FPS) causes the system to drop ticks and causes resynchronization. But I guess that could be by design (it's not quite obvious what would be the right behavior).
Matias
05/06/2007 (11:24 pm)
Oh yeah, I turned of the interpolation.I should also clarify that when I said "the two computers will eventually fall out of sync", I actually meant that the clients still need to periodically resynchronize. But as you said this doesn't necessarily mean they "are falling out of sync", but it could also occur from a spike in network latency. And also the "200ms figure" was an estimate. Previously the pattern was easy to see as it happened often and always "in same direction". I'll have to study this some more and post back.
One thing I've noticed is that a spike in the performance (e.g. dropping to single digit FPS) causes the system to drop ticks and causes resynchronization. But I guess that could be by design (it's not quite obvious what would be the right behavior).
Matias
#23
- Resynchronizations still occur, but there's no obvious pattern. It can go over ten minutes without one, or there can several be in row.
- Sometimes resynchronizations occur as bursts, i.e. there are several one after the other. In this case synchronizations happened always in the direction of client falling behind the server. Client is running on a slower machine (average 20fps), which could mean that a performance problems can cause ticks to be dropped. I'm not sure if this is by design (trying prevent a total lockdown by scaling down) , and whether it's caused by TorqueX or XNA.
- When running the client over WLAN with low signal, resynchronizations occured steadily but did not cumulate. What I mean is that there were a lot of them, but the sum of those resynchronizations stayed close to zero. This could indicate that the actual ticking stayed synchronized, but network latency caused the resynchronization logic to trigger (it doesn't try to filter short term spikes).
Matias
05/15/2007 (12:46 pm)
I made a few further experiments with some observations:- Resynchronizations still occur, but there's no obvious pattern. It can go over ten minutes without one, or there can several be in row.
- Sometimes resynchronizations occur as bursts, i.e. there are several one after the other. In this case synchronizations happened always in the direction of client falling behind the server. Client is running on a slower machine (average 20fps), which could mean that a performance problems can cause ticks to be dropped. I'm not sure if this is by design (trying prevent a total lockdown by scaling down) , and whether it's caused by TorqueX or XNA.
- When running the client over WLAN with low signal, resynchronizations occured steadily but did not cumulate. What I mean is that there were a lot of them, but the sum of those resynchronizations stayed close to zero. This could indicate that the actual ticking stayed synchronized, but network latency caused the resynchronization logic to trigger (it doesn't try to filter short term spikes).
Matias
#24
I don't have anything to back this up, but it kind of "makes sense" to me that Garbage Collection would steal cycles that may not be readily apparent to Torque X, and they would happen at seemingly "random" times.
05/15/2007 (4:52 pm)
My gut tells me that you are seeing artifacts of a managed language--possibly garbage collection in the background that Torque X can't and/or isn't compensating for.I don't have anything to back this up, but it kind of "makes sense" to me that Garbage Collection would steal cycles that may not be readily apparent to Torque X, and they would happen at seemingly "random" times.
#25
If not, then the two computers are in sync and there is just an intermittent lag someplace in the system (maybe gc as Stephen suggests or network lag). Since the internal clock is still accurate, a positive difference now will mean a negative difference later and the two computers will be back in sync (in my work as a research psychologist maany moons ago we used to call this a negative autocorrelation).
If the mis-matches accumulate -- meaning bigger and bigger time differences accrue linearly with time -- then it's a clock difference. This might happen for a number of reasons. The remainder bug is one reason. We might also have a cap on the delta time and not properly crediting the capped time increment for later. Or maybe XNA is handing us the wrong time (hey, maybe they have the remainder bug, it's easy to do :).
Anyway, it sounds like there is no accumulation so everything is behaving as expected (i.e., sometimes one processor eats up some time or a packet takes a while to arrive, resulting in getting old information and treating it as new).
05/15/2007 (6:08 pm)
The issue is always whether time mis-matches accumulate or not.If not, then the two computers are in sync and there is just an intermittent lag someplace in the system (maybe gc as Stephen suggests or network lag). Since the internal clock is still accurate, a positive difference now will mean a negative difference later and the two computers will be back in sync (in my work as a research psychologist maany moons ago we used to call this a negative autocorrelation).
If the mis-matches accumulate -- meaning bigger and bigger time differences accrue linearly with time -- then it's a clock difference. This might happen for a number of reasons. The remainder bug is one reason. We might also have a cap on the delta time and not properly crediting the capped time increment for later. Or maybe XNA is handing us the wrong time (hey, maybe they have the remainder bug, it's easy to do :).
Anyway, it sounds like there is no accumulation so everything is behaving as expected (i.e., sometimes one processor eats up some time or a packet takes a while to arrive, resulting in getting old information and treating it as new).
#26
if you guys have access to a copy of Game Programming Gems 3 (the CD for it actually) there is a copy of a tool called "NetTool" (including the C++ source code) that can be used to simulate "real" internet connections.
It is explained in chapter 5.7 of the book.
So Matias, if you were using that tool it should be easier to see if the resyncs you are seeing were network or game related.
05/15/2007 (6:33 pm)
@Clark and Mantas,if you guys have access to a copy of Game Programming Gems 3 (the CD for it actually) there is a copy of a tool called "NetTool" (including the C++ source code) that can be used to simulate "real" internet connections.
It is explained in chapter 5.7 of the book.
So Matias, if you were using that tool it should be easier to see if the resyncs you are seeing were network or game related.
Torque Owner Clark Fagot
I'm not sure if the error accumulates due to systematic differences or if random variation just causes drift, but I know for sure that throwing away those fractional milliseconds causes a problem. I first discovered the issue in TGE when trying to figure out what a dedicated server in the background was causing hitches. Eventually found that the clocks on the foreground client and background server were drifting apart, even though they were on the same (single processor) machine!
Anyway, the games actually shouldn't drift apart even after several minutes. That could be happening for a couple reasons. 1) it might not be drift but network variation. Does the error seem to accumulate? I.e., will be apart by 1 tick after a few minutes, 2 ticks after a few more, etc? Or does it seem like they drift apart after a few minutes but then get back in sync? 2) it might be that xna isn't handing off consistent time updates. That would suck, but it's still possible. Might be worth tracking the total milliseconds like you do for longer periods of time. Being off by 200 ms after several minutes would be bad, but I assume you were just eye balling it in your previous test.
Any information you can give us about this issue would be great. I know we'll have to look into it more closely on our end eventually, but we don't have resources atm, especially since networking isn't a major priority for us until xna gets networking.