Game Development Community

Network lost (strange behaviour)

by Fyodor -bank- Osokin · in Torque Game Engine · 04/01/2007 (2:39 pm) · 12 replies

Hi.
This "bug" exists since I started playing with Torque (1.3, but still exists in TGE1.5)... But, not sure if it's a real "bug"..

Let me tell you how to reproduce it and what is at the end, and you will get whole idea.

#. Start dedicated server.

#. Start client and connect to the server.

#. Start dragging the Client window around the desktop for a second or two, and, after you release your mouse, you will see (or not see, only happens sometimes) the "Lag Icon".

This could happen if you switch to another application and then back to Torque Client.

BUT, the most interesting thing here, that the client STILL receive commands from server. Try to ping the client with MessageAll() command - the client will display the message (though, not instantly), so it means that connection still exists. But the client "thinking" that connection get lost. So, sending messages to server will not work.

(Pressing "N" to see the network stats will not display anything, like there is no network, but RARELY it could jump in a moment like in a short time the network is still "there").

You can have running the client in this mode forever, the server will not disconnect it by timeout, nor the client get disconnected..

As I can see, you can't really debug this...

Anyone knows anything? Any network-guru around here?
I'm open for discussion, any tests, etc..

#2
10/20/2007 (12:35 pm)
Update: it has something to do with the memory.
If the "client" uses more memory (or actively calculating something) the more chances to get this behaviour. Freeing up more memory in OS can improve it also..
I'll try disabling TMM (memory manager) and see what happens.

P.S. Still looking for someone who seen that and *possibly* knows where to look for solving that.
#3
10/20/2007 (2:01 pm)
It's probably using up enough CPU power to slow the network code down
enough that it loses synchronization with the server. Not sure what could
be done to fix it. I've always played full screen, so haven't noticed it, and
at the moment can't duplicate this.
#4
10/20/2007 (5:04 pm)
@Kevin:
I've traced the CPU usage, and it there is enough "resources" left, so I don't think it can be related. But anyway, it doesn't explain the "one-way connection" behaviour.

I've disabled TMM, and so far I was unable to reproduce that... So, it looks like memory-related issue.
Will run more tests to be sure.
#5
10/21/2007 (4:09 pm)
Okay. with disabled TMM I don't get that "network looses", but I have another problem that forces me to rollback the changes.. From time to time, the torque hangs for a second, like it's loading something. If I run around a lot, it will appear like once or twice in a second.

Anyone who played with TMM around? Any advices? :)
#6
10/21/2007 (4:25 pm)
(removed so I don't feel so stupid..)
#7
10/21/2007 (4:29 pm)
Torque Memory Manager can be disabled by adding the following line at the top of platform/platform.h after all includes:
#define TORQUE_DISABLE_MEMORY_MANAGER
#8
12/03/2007 (1:10 am)
Just to say that I'm having the same problem, and still looking for a solution. However, I noticed that, when frozen, the method ProcessList::advanceClientTime() in gameProcess.cc always goes into this conditional around line 110:

GameConnection* connection = GameConnection::getConnectionToServer();
   if (connection)
   {
      // If the connection to the server is backlogged
      // the simulation is frozen.
      if (connection->isBacklogged()) {
         mLastTime = targetTime;
         mLastTick = targetTick;
         PROFILE_END();
         return false;
      }
      if (connection->areMovesPending())
         control = connection->getControlObject();
   }

It always detects backlogging and returns false.

Climbig higher in the callstack, this method is called in the clientProcess function in game.cc around line 550. The function also returns false in this case.

Even higher in the callstack, in the DemoGame::processTimeEvent() method around line 650 in main.cc this chunk of code occurs:

PROFILE_START(ClientProcess);
tickPass = clientProcess(timeDelta);
PROFILE_END();
PROFILE_START(ClientNetProcess);
if(tickPass)
   GNet->processClient();
PROFILE_END();

I tried commenting out the if(tickPass) line, and it seems to solve the problem, but introduces random crashes on pack/unpack events.

Hope this helps. I'll be keeping a close eye on how this thread evolves.
#9
12/03/2007 (6:45 am)
Having not looked at the code, my first inclination would be to check if there are outstanding packets that were sent and never received. Or sent and received, but never ACK'd. Both sides may be waiting on the other for the next move.
#10
12/03/2007 (9:16 am)
I noticed some bad behavior in Torque's UDP stack when it gets backlogged. I never really tracked it down, and it seems state dependent (just getting backlogged, or full, wasn't enough... dnet.cc also had to be in the right state). It drove me a bit batty trying to debug it.

Eventually, I just decided it was simpler to put networking on a background thread which uses a deque to communicate with the main thread for delivering network events.

This seems to work well enough... so far.
#11
02/18/2008 (1:41 pm)
Hi,

I'm also experiencing a backlogging issue on my client. Does anybody know or can explain why this happens?
What's the easiest way to fix this?

Thanks.
#12
01/08/2009 (4:43 am)
Okay, now I have something to share.
After some discussion with Rene Damm (thanks buddy!) and playing around (and LOTS of testing) -- disabling TMM will lead to rare crashes!
The description on how to replicate (in my original post) does not really allows you to replicate that. This appears mostly with high loaded network, so its not easy to test and replicate.

So, to sum it up:
With Torque Memory Manager -- the network just hangs without the reason sometimes (can be hours of online playing without problems and can appear after 2 minutes of playing with relogging for 5 times in a row).
Without TMM -- the engine crashes. (again, can be hours of smooth playing and can be crash after crash).
The crash itself happens inside the BitStream::readBits:
while(byteCount--)
   {
      U8 nextB = *++stPtr; // <<< Crashing here!!!
      *ptr++ = (curB >> downShift) | (nextB << upShift);
      curB = nextB;
   }
I have *lots* of mini-dumps from our testers and all of them shows the same spot, however the call stack differs sometimes.

The most crashes comes from Player::unpackUpdate, but the exact spot differs from time to time -- mostly its last readFloat() for energy.
Next (by popularity, about 5-10%% of crashes) comes from reading stream in last line of PlayerData::unpackData (thats the loading/datablock transmition phrase).
Another one (3-5% of crashes) are from NetConnection::ghostReadPacket:
index = (U32) bstream->readInt(idSize);
But all of those leads to readBits() method.

In all mini-dumps I have, the stream itself have bad pointer but i can't think about how it come.
Rene gave an idea that something multi-threaded (concurrency) can affect the network (memory corruption?).

So, again: this bug exist in all versions of stock Torque since 1.4 upto 1.5.2. (I haven't really tested 1.3 for this behaviour nor any of TGEAs).

anyway, all these tests does not explain why it happens and its really hard to track down the bug, as I can play for HOURS without crash (once was waiting for this bug for 6 hours on 2 computers before went to sleep)..

if anyone have anything to share -- you are welcome.