Horrible lag with 15 units on screen
by Justin Tolchin · in RTS Starter Kit · 05/04/2005 (11:18 am) · 17 replies
Hi all,
I'm having a terrible performance problem that seems to be network related. I'm running a version of TSE with the RTS mods ported to it. I've been noticing that whenever I have 12-15 RTSUnits on the screen, the network seems to completely bog down when I issue a move order (maybe it's bogged down all the time, but that's when I notice it). I say "network" meaning the client/server architecture but I'm actually running this on a single machine.
I added some output to the client-side "inputHandler.cs" script to print out the current time (using Platform::getVirtualMilliseconds) when it issues a move order to a unit, and another one when it is received on the server end.
So in inputHandler.cs I have this:
and in commands.cs I have this:
This is what I see when I only have 1 unit in the game:
45 milliseconds or so. Seems good to me.
Here's what I get with 15 units:
8 seconds! The next time I issued a move command it took 14.8 seconds, then 18 seconds, then 20 seconds.
I have no idea what is tying up the networking code since nothing is really going through the network anyway since it's all local. Is there a separate thread to handle the client-to-server commands that is not getting enough cycles? I have no idea how to debug this so any suggestions would be welcome.
Thanks!
I'm having a terrible performance problem that seems to be network related. I'm running a version of TSE with the RTS mods ported to it. I've been noticing that whenever I have 12-15 RTSUnits on the screen, the network seems to completely bog down when I issue a move order (maybe it's bogged down all the time, but that's when I notice it). I say "network" meaning the client/server architecture but I'm actually running this on a single machine.
I added some output to the client-side "inputHandler.cs" script to print out the current time (using Platform::getVirtualMilliseconds) when it issues a move order to a unit, and another one when it is received on the server end.
So in inputHandler.cs I have this:
...
...
echo("***** issuing move order to server...");
echo("Current Time:" SPC getVirtualMilliseconds());
commandToServer('IssueMove', %x, %y, %z);
...
...and in commands.cs I have this:
function serverCmdIssueMove(%client, %x, %y, %z)
{
echo("***** server command issue move called.");
echo("Current Time:" SPC getVirtualMilliseconds());
...
...This is what I see when I only have 1 unit in the game:
***** issuing move order to server... Current Time: 41884 ***** server command issue move called. Current Time: 41929
45 milliseconds or so. Seems good to me.
Here's what I get with 15 units:
***** issuing move order to server... Current Time: 170656 keyboard0 input device unacquired. ***** server command issue move called. Current Time: 178696
8 seconds! The next time I issued a move command it took 14.8 seconds, then 18 seconds, then 20 seconds.
I have no idea what is tying up the networking code since nothing is really going through the network anyway since it's all local. Is there a separate thread to handle the client-to-server commands that is not getting enough cycles? I have no idea how to debug this so any suggestions would be welcome.
Thanks!
#2
I'm completely at a loss to explain this behavior. I ran the profiler but nothing is jumping out at me. The game spends maybe 80% of it's time in the rendering loop, and the rest of it looks like normal game state updating. The only thing I could think was that maybe the network code is using a thread to poll for messages and it's not getting a high enough priority so it rarely runs. Of course I'm just speculating since I have no idea how the networking code works or if it's even to blame here. It's strange that the game seems more or less ok up until around 12 units (though I can see some additional sluggishness in cmdToServerXXX calls for every unit I add) but once I get around 12 or 13 the lag jumps up dramatically.
05/04/2005 (8:54 pm)
Sorry Stephen, I didn't word it properly. I'm not talking about two instances of TSE running, just one. Just a normal boring single-player configuration. I was just trying to refer to the fact that, as I understood it, Torque is still using a client/server architecture in single-player games, even though the networking is essentially "short-circuited" in this case. Sorry for the confusion.I'm completely at a loss to explain this behavior. I ran the profiler but nothing is jumping out at me. The game spends maybe 80% of it's time in the rendering loop, and the rest of it looks like normal game state updating. The only thing I could think was that maybe the network code is using a thread to poll for messages and it's not getting a high enough priority so it rarely runs. Of course I'm just speculating since I have no idea how the networking code works or if it's even to blame here. It's strange that the game seems more or less ok up until around 12 units (though I can see some additional sluggishness in cmdToServerXXX calls for every unit I add) but once I get around 12 or 13 the lag jumps up dramatically.
#3
Cool if so. What version of TSE (eg HEAD or Milestone 1, or other interim CVS checkout, or ?). Does this happen w/ the stock RTS SK integrated with TSE, where you haven't made any changes to the code? Does it happen in the TGE stock version?
It certainly shouldn't... this would be a very well-known, very painful bug if it happened to others. :) So, something funky must be going on in this particular situation.
05/04/2005 (11:08 pm)
Am I reading correctly that you're running this in TSE? Just making sure that wasn't mis-typed.Cool if so. What version of TSE (eg HEAD or Milestone 1, or other interim CVS checkout, or ?). Does this happen w/ the stock RTS SK integrated with TSE, where you haven't made any changes to the code? Does it happen in the TGE stock version?
It certainly shouldn't... this would be a very well-known, very painful bug if it happened to others. :) So, something funky must be going on in this particular situation.
#4
I have certainly made a number of changes to the C++ code and script, but nothing I can think of that should have caused something like this to happen (though I'm sure I'll find out this is my fault eventually :-) ). I never touched the networking layer except maybe a few changes to packUpdate/unpackUpdate in RTSUnit.cc to send some additional state info down to the client.
I'm guessing there must be some constant "per unit" traffic going through the networking layer but I don't know what that would be. I thought maybe it was the unit animation info, so one of the changes I made was to try and make all the animation run client-side only. I didn't see the point in running the run/walk/idle animations on the server anyway. But after making that change I'm still having the exact same problem. :-(
05/05/2005 (7:53 am)
Hi Josh. Yes, I'm using TSE. The version I have was the HEAD version at the time that the water block stuff was added in a few weeks ago (I haven't checked since then so I don't know if that's still the latest build). I honestly don't know if this happens with the stock RTS stuff + TSE, as I just never happened to have that many units added at one time. Actually I don't even know if I ever had a stock TSE + stock RTS build going, since by the time I got TSE I'd already made a bunch of changes to the RTS scripts that I ported over with the rest of the RTS stuff. It's also possible that I simply didn't do the integration properly and I left something important out.I have certainly made a number of changes to the C++ code and script, but nothing I can think of that should have caused something like this to happen (though I'm sure I'll find out this is my fault eventually :-) ). I never touched the networking layer except maybe a few changes to packUpdate/unpackUpdate in RTSUnit.cc to send some additional state info down to the client.
I'm guessing there must be some constant "per unit" traffic going through the networking layer but I don't know what that would be. I thought maybe it was the unit animation info, so one of the changes I made was to try and make all the animation run client-side only. I didn't see the point in running the run/walk/idle animations on the server anyway. But after making that change I'm still having the exact same problem. :-(
#5
05/06/2005 (11:51 pm)
We did a lot of development on the RTS kit with two (or more) instances running on the same computer... Slower framerate, but no major issues. There's not actually any constant traffic for units, just for specific events... Strange results to be sure. Is this the case for all commandToServers?
#6
Also, I have tried this on 2 different systems now and the same thing happens so it doesn't seem to be anything hardware related.
05/07/2005 (6:34 am)
Yep. I have another commandToServer when the client wants to set the "rally point" for a building and it exhibits the same behavior when there are 12+ units in the game. Also, I have tried this on 2 different systems now and the same thing happens so it doesn't seem to be anything hardware related.
#7
05/07/2005 (11:12 pm)
That's pretty wild and scary. I think you're going to have to dig through the networking code a bit to find out why there's so much lag happening. For a local client and server anything over around 100ms is very very odd.
#8
When I did that I noticed that the only networking event that was consistently being posted (while I was waiting for my "IssueMove" events or whatever to get through the network layer) were RTSCameraUpdate events. I also noticed that I was getting a bunch of those events even when I wasn't moving the camera. This seemed to be the only thing going on that I could alter, so I removed the following line from RTSCamera::ProcessTick:
Voila! Suddenly the problem went away. I could click to move my units and they more-or-less instantly responded. The downsides are: a) the framerate is still horrible (3 or 4 FPS in debug mode with 16 units on screen), b) I have no idea what removing the RTSCameraUpdate events will do (haven't seen any problems yet), c) I don't see how the RTSCameraUpdate events could even be responsible, since it's not like there should be *more* of them just because I have more units on-screen, d) why is nobody else having this problem with camera events clogging the network pipe? Is there something wrong with my RTSConnection? Clearly there's something abnormal going on here.
Another strange thing: even though the RTSCamera update events aren't being sent constantly anymore, I'm still seeing some weird events in the logs. By far the most commmon event i'm seeing now (usually 10 or so in a row every second or so) is this:
PKLOG 1427 EVENT 66: RemoteCommandEvent [MsgStatsList]
and I can't figure out where these are coming from because there is only one line of script that uses that event and that's in the function "sendStatsListToClient" which *never* gets called. I've also noticed that the number after "EVENT" varies quite a bit. It's 66 above but I've also seen 28, 48, 54, and 56. But they always get mapped to MsgStatsList, so maybe this is just a bug in the output?
I'm also seeing a few of these:
PKLOG 1425 EVENT 17: RemoteCommandEvent [MsgResearchComplete]
but there's no code I can find that sends out an event like this, so where is it coming from?
These performance issues are quickly becoming a critical problem in my game. I would really appreciate any thoughts/suggestions from anyone.
Thanks!
05/16/2005 (1:53 pm)
Ok, I've finally gotten back to trying to figure out why performance goes to hell when I add more than 12 units or so. I wondered if maybe it was my RTSUnit::ProcessTick code, since it seemed logical that if more units was causing the problem it must be some code being called per unit. So I removed all of it and it had absolutely no effect. So much for logic. So I went down to the network layer to try and figure out why the client events weren't reaching the server for 5+ seconds. I really don't understand the networking layer very well, but it seems to work off of 2 linked lists (one for ordered events and one for unordered). While I was trying to understand the code I noticed that there was a DEBUG_NET flag that was undefined so I put in a #define for it. That enabled the DEBUG_LOG macro which spits out a lot of stuff that the networking layer is doing, including printing the events that come through on those 2 linked lists.When I did that I noticed that the only networking event that was consistently being posted (while I was waiting for my "IssueMove" events or whatever to get through the network layer) were RTSCameraUpdate events. I also noticed that I was getting a bunch of those events even when I wasn't moving the camera. This seemed to be the only thing going on that I could alter, so I removed the following line from RTSCamera::ProcessTick:
if(conn)
conn->postNetEvent(new RTSCameraUpdate(this, false));Voila! Suddenly the problem went away. I could click to move my units and they more-or-less instantly responded. The downsides are: a) the framerate is still horrible (3 or 4 FPS in debug mode with 16 units on screen), b) I have no idea what removing the RTSCameraUpdate events will do (haven't seen any problems yet), c) I don't see how the RTSCameraUpdate events could even be responsible, since it's not like there should be *more* of them just because I have more units on-screen, d) why is nobody else having this problem with camera events clogging the network pipe? Is there something wrong with my RTSConnection? Clearly there's something abnormal going on here.
Another strange thing: even though the RTSCamera update events aren't being sent constantly anymore, I'm still seeing some weird events in the logs. By far the most commmon event i'm seeing now (usually 10 or so in a row every second or so) is this:
PKLOG 1427 EVENT 66: RemoteCommandEvent [MsgStatsList]
and I can't figure out where these are coming from because there is only one line of script that uses that event and that's in the function "sendStatsListToClient" which *never* gets called. I've also noticed that the number after "EVENT" varies quite a bit. It's 66 above but I've also seen 28, 48, 54, and 56. But they always get mapped to MsgStatsList, so maybe this is just a bug in the output?
I'm also seeing a few of these:
PKLOG 1425 EVENT 17: RemoteCommandEvent [MsgResearchComplete]
but there's no code I can find that sends out an event like this, so where is it coming from?
These performance issues are quickly becoming a critical problem in my game. I would really appreciate any thoughts/suggestions from anyone.
Thanks!
#9
Also, you may want to turn on DEBUG_NET in a stock version (grab the latest installer) and see what type of parallel logs you get. For example, the MsgStatsList is actually an expected netEvent (stock RTS-SK actually keeps track of your unit's kill counts, and allows you to inform the clients about them if you want), but while MsgResearchComplete isa valid netEvent, AFAIK it's not actually used in stock at this time.
It's also possible that you simply have a confused .obj environment (just happens sometimes), and a make clean might fix things up for you...
05/16/2005 (6:19 pm)
This is definitely not stock behaviour, so would you mind describing the general changes you have made along the way to get to here?Also, you may want to turn on DEBUG_NET in a stock version (grab the latest installer) and see what type of parallel logs you get. For example, the MsgStatsList is actually an expected netEvent (stock RTS-SK actually keeps track of your unit's kill counts, and allows you to inform the clients about them if you want), but while MsgResearchComplete isa valid netEvent, AFAIK it's not actually used in stock at this time.
It's also possible that you simply have a confused .obj environment (just happens sometimes), and a make clean might fix things up for you...
#10
Have you tried using the profiler to identify the problem?
05/16/2005 (11:32 pm)
A RemoteComandEvent is used by the scripting system for... uh... remote commands. :) ie, commandToServer, commandToClient.Have you tried using the profiler to identify the problem?
#11
As you can see, after 185 frames, it tried to render the unit 1110 times (6 times per frame). Now I don't know if this is wrong or not, because it's not really getting all the way to the TSShapeInstance::MeshObjectInstance::render every pass, it only seems to do that twice per frame. It seems that SceneState::renderCurrentImages calls the renderImage 6 times though.
Two occur here:
Two more here:
and the last two here:
That's all from the "renderCurrentImages" function.
I don't know why its treating these units as refraction and glow images.What controls this? And why is it being called twice per object anyway? I'm just wondering if this might be what's killing my performance (or part of it).
Thanks!
05/17/2005 (3:56 pm)
I'm in the process of debugging this still, and one thing I just noticed (I don't know why I didn't notice it before) is that it seems like my RTSUnits are being drawn 6 times every frame. That is to say, RTSUnit::renderImage is called 6 times per frame. Is this normal? Here is the profiler output for a scene that has only one RTSUnit in it.92.171 0.012 185 RenderFrame 91.335 29.806 185 CanvasRenderControls 60.770 0.145 185 SceneGraphRender 55.079 4.256 185 TraverseScene 50.752 5.531 2960 SceneStateRenderImage 26.656 0.175 1665 ShapeBaseRenderObject 25.979 0.063 1110 RTSUnitRenderImage 25.916 0.130 1110 TSShapeInstanceRender
As you can see, after 185 frames, it tried to render the unit 1110 times (6 times per frame). Now I don't know if this is wrong or not, because it's not really getting all the way to the TSShapeInstance::MeshObjectInstance::render every pass, it only seems to do that twice per frame. It seems that SceneState::renderCurrentImages calls the renderImage 6 times though.
Two occur here:
for (i = 0; i < mRenderImages.size(); i++)
renderImage(this, mRenderImages[i]);Two more here:
for (i = 0; i < mGlowImages.size(); i++)
{
renderImage(this, mGlowImages[i]);
}and the last two here:
// render refraction surfaces
for (i = 0; i < mRefractionImages.size(); i++)
{
renderImage( this, mRefractionImages[i] );
}That's all from the "renderCurrentImages" function.
I don't know why its treating these units as refraction and glow images.What controls this? And why is it being called twice per object anyway? I'm just wondering if this might be what's killing my performance (or part of it).
Thanks!
#12
05/17/2005 (5:07 pm)
Oops! I apologize. I was wrong: there were 2 RTSUnits in my scene, not one. So that explains why I was seeing renderImage called 6 times per frame. It was really 3 times per frame per unit. I'm still curious if this is normal or not (meaning, should every unit be added to the mGlowImages and mRefractionImages lists?). Sorry about the incorrect info. It's been a long day of debugging. :-(
#13
05/17/2005 (11:16 pm)
PrepRenderImage controls it. I doubt it's a significant issue, though why you're registering glow and refraction for RTS shapes seems a bit mysterious.
#14
www.garagegames.com/mg/forums/result.thread.php?qt=19866
Taking that function out doubled the frame rate. Of course, that meant going from (in a debug build) 15 fps to 30 fps when there's only one unit on-screen, and 4 fps to 8 fps with 16 units on screen, which is still lousy but at least it's better. :-/ It also means that the buildings we want to bump map won't have the info they need, so I need to find a way to call that function for those objects that need bump mapping.
@Stephen: summarizing all the changes I've made porting the RTS stuff over the TSE would take a while, but what I think I did initially was just take everything in /engine/game/rts and bring it over along with the bulk of the scripts, then I diff'd the other /engine folders and brought over the changes that I thought made sense (that's where things get a bit fuzzy), updated the renderXXX methods for RTSUnit and RTSBuildingMarker with the new GFX stuff (basically by copying/pasting the code from one of the TSE classes, probably player.cpp but I'm not sure). I managed to get the damage bar and rubber-band selection box drawing back in, but not the selection circles or the minimap. I've certainly made a lot of changes since then, mostly to the RTSUnit and RTSBuilding script files, a bunch of changes in the RTSUnit.cc and RTSBuilding.cc classes to add in some additional functionality (mostly along the lines of getting some A* pathfinding in and assigning "tasks" to the units), and I recently restructured the RTSUnit::ProcessTick code but that was long after all these problems started.
I don't know how many others have tried combining RTS and TSE so I don't know how much of this problem (and the other weird problems I've been having) are "normal" and how much of it is stuff that I screwed up in the integration somewhere. At some point I know I still need to go back to a stock TSE build and add in just the minimum amount of RTS stuff to get one unit on the screen and see what kind of performance I get and just start adding stuff back in from there. It's just difficult to find time with everything else I have to do.
Stephen or Ben, have you ever tried merging the basic RTS stuff with TSE?
05/18/2005 (9:28 am)
@Ben: Yeah, seemed weird to me too, but I guess that's the default behavior for RTSUnits for whatever reason. After some more checking it doesn't seem to be a big deal anyway. I did end up tracking down most of the rendering time to the TSSkinMesh::updateSkin method (it seems to spend 90% of the time in there!) and a big percentage of that was taken up by the call to TSMesh::fillTextureSpaceInfo. I did some checking in the forums which confirmed that this was a big performance hit for others: www.garagegames.com/mg/forums/result.thread.php?qt=19866
Taking that function out doubled the frame rate. Of course, that meant going from (in a debug build) 15 fps to 30 fps when there's only one unit on-screen, and 4 fps to 8 fps with 16 units on screen, which is still lousy but at least it's better. :-/ It also means that the buildings we want to bump map won't have the info they need, so I need to find a way to call that function for those objects that need bump mapping.
@Stephen: summarizing all the changes I've made porting the RTS stuff over the TSE would take a while, but what I think I did initially was just take everything in /engine/game/rts and bring it over along with the bulk of the scripts, then I diff'd the other /engine folders and brought over the changes that I thought made sense (that's where things get a bit fuzzy), updated the renderXXX methods for RTSUnit and RTSBuildingMarker with the new GFX stuff (basically by copying/pasting the code from one of the TSE classes, probably player.cpp but I'm not sure). I managed to get the damage bar and rubber-band selection box drawing back in, but not the selection circles or the minimap. I've certainly made a lot of changes since then, mostly to the RTSUnit and RTSBuilding script files, a bunch of changes in the RTSUnit.cc and RTSBuilding.cc classes to add in some additional functionality (mostly along the lines of getting some A* pathfinding in and assigning "tasks" to the units), and I recently restructured the RTSUnit::ProcessTick code but that was long after all these problems started.
I don't know how many others have tried combining RTS and TSE so I don't know how much of this problem (and the other weird problems I've been having) are "normal" and how much of it is stuff that I screwed up in the integration somewhere. At some point I know I still need to go back to a stock TSE build and add in just the minimum amount of RTS stuff to get one unit on the screen and see what kind of performance I get and just start adding stuff back in from there. It's just difficult to find time with everything else I have to do.
Stephen or Ben, have you ever tried merging the basic RTS stuff with TSE?
#15
Not yet, but I'm taking notes!
And I apologize, I didn't realize until now that this was a port to TSE--I'm about clueless on the TSE side of the house right now, so I'm going to bow out as a participant in this discussion! I'm sure that Ben G. however would love to hear anything else that you discover, so please keep on pressing!
05/18/2005 (10:16 am)
Quote:
Stephen or Ben, have you ever tried merging the basic RTS stuff with TSE?
Not yet, but I'm taking notes!
And I apologize, I didn't realize until now that this was a port to TSE--I'm about clueless on the TSE side of the house right now, so I'm going to bow out as a participant in this discussion! I'm sure that Ben G. however would love to hear anything else that you discover, so please keep on pressing!
#16
My personal theory is that there's just some innocent SNAFU somewhere in your port, but that you're gonna end up having to learn a fair amount about TSE before you find it. :-/
When TSE gets to a less rapidly changing state, we'll get the RTS kit up and running on it in an official way... till then we're gonna burn our engineering hours on other features, like getting that terrain done. :P
05/18/2005 (12:16 pm)
We haven't, as TSE is still in EA mode.. In normal TSE, I've had a good thirty players on screen at a time at around 50fps. This is on a 9700 mobile, 1.8 ghz system.My personal theory is that there's just some innocent SNAFU somewhere in your port, but that you're gonna end up having to learn a fair amount about TSE before you find it. :-/
When TSE gets to a less rapidly changing state, we'll get the RTS kit up and running on it in an official way... till then we're gonna burn our engineering hours on other features, like getting that terrain done. :P
#17
Yeah, I'm still very puzzled by the weird events I was seeing (MsgStatsList and MsgResearchComplete). I'll have to turn the DEBUG_NET stuff back on, and see if I can track down the source. Based on my searches there's absolutely nothing in my RTS code that should be sending out either one of those events. And I still have no idea why those extra RTSCameraUpdate events were bogging down the system so much. :-(
05/18/2005 (12:28 pm)
Shadows would be nice, too. :-)Yeah, I'm still very puzzled by the weird events I was seeing (MsgStatsList and MsgResearchComplete). I'll have to turn the DEBUG_NET stuff back on, and see if I can track down the source. Based on my searches there's absolutely nothing in my RTS code that should be sending out either one of those events. And I still have no idea why those extra RTSCameraUpdate events were bogging down the system so much. :-(
Torque 3D Owner Stephen Zepp
For testing multi-player work, you really need two separate boxes to get any grasp of real performance at all. For reference, with our dedicated server over the internet, we can have multiple hundreds of units moving in all different directions, attacking, etc. with absolutely no issues except for (in some low end cases) graphics performance for the cards themselves on the lower end.