Network Compression Ideas
by Bryan Turner · in Torque Game Engine · 03/27/2002 (9:02 pm) · 10 replies
I've finished my digging about in the network code, and have some ideas that could compress the network stream up to 50%.
The way I judged this was to log every packet from a live Realm Wars server, and plot the location of each player at each packet. I also looked at what was filling up the packet data.
In a 5-7 person game:
- Packets contain ghost update data from 150 - 200 bytes.
- 91% of the ghost update size is filled with player update data (position, velocity, orientation, moves).
- Nearly 1/3 of that (31%) is the position vector.
So my ideas mostly concern themselves with compressing the position vectors, and player updates.
---------------------
Idea #1 - Pass LOD parameter to packUpdate().
In the networking code, there is no concept of Level of Detail. Whether you are in the same orc hut with someone, or they are on the other side of the map, you still get *precise* coordinates for their location. Some amount of range compression is performed such that not all 32 bits of float are sent for close opponents.
If you viewed the amount of bits sent as static, and the 'shift' of the bits as dynamic, then you could always send 10 bits of coordinates per dimension. For close opponents the bits would consist of mostly fractional digits for precise rendering. For distant opponents, the bits would represent 10's or 100's of units since most of the precision is already lost at render time. (Distance is not the best metric here, since you might be looking through a scope, but it's good for an example).
Other elements in the update structure can also be left out or scaled down using the LOD. For instance, a sufficiently distant opponent's orientation is irrelevant. That's 33 bits per player update per packet.
There is already a parameter 'scale' in the writeCompressed() function which performs this shifting. However, it is not being used at the moment. Was this the intended purpose of it?
In the protocol, I imagine LOD would be a range 0..7 (3 bits) and be passed as the first parameter of a ghost update after its index. By my crude calculations, this step could achieve a maximum 36% compression of the player position/orientation/velocity vectors (103 bits -> ~66 bits).
---------------------
Idea #2 - More compression points.
The network engine has the concept of a 'compression point', to which it compares all player update vectors. If they are sufficiently close, it encodes them in fewer bits.
Why not extend this to multiple compression points? An initial set of compression points could come with the map, and as fire fights break out, the server could send a small update specifying where the new compression point is. By keeping a set of these points, multiple groups of players clustered together get their positions compressed separately.
Server side, this means sorting the player objects by distance to a compression point, then sending them in a particular order (index of compression point, followed by all player updates using that point, then index of next point, etc..).
This idea requires a lot more changes than the previous idea, and I'm not completely convinced if there would be additional payoff if the first idea was also implemented. But alone, this scheme could shave up to 15% of the total ghost update size (by cutting the average position vector size almost in half).
---------------------
Idea #3 - Range compression of the Z coordinate
For many games, the Z coordinate will be highly compressed in its range over any particular area of the map. This translates to huge savings in position vector size (33% max). Taking advantage of this savings requires some special encoding schemes. Arithmetic Coding is typically used for this purpose. Basically you encode the Z coordinate as a probability. With high probability of being close to the ground, it is possible to encode the average-case values in about 5 bits (as opposed to the 16-32 bits used now). You pay for this super-compression on the other end of the scale.. it may take MORE than 32 bits to specify an abnormally high or low Z coordinate.
For most walking/driving games, this is not such a bad trade off (and it stacks well with the other two ideas). This savings (by itself) can reduce the ghost update size by about 10%.
---------------------
Idea #4 - Arithmetic coding for lots of other stuff...
And finally, the big-daddy idea, which has a lot of pros and also a lot of cons. Switching the entire protocol over to an arithmetic encoding could significantly reduce the average-case packet size. This is based on my data mining while playing Realm Wars with 5-7 active players.
A quick example: Each packet contains a 'true' bit for each ghost update, and a single 'false' bit at the end of the channel. The ratio of true to false for these bits is (on average) 9:1. Arithmetic coding allows you to encode that ratio in (exactly) 1/9 of a bit! This is an immediate savings of 1 byte per packet.
There are numerous such bits scattered through the packets. They are easily modeled by playing a few games, and the models can even be updated dynamically (although this adds overhead to the protocol).
The benefits of this scheme become even more apparent when the number of objects being tracked increases, since most of the modeled bits are per-ghost or per-event.
So what's the downside to this silver bullet? Arithmetic Coding requires a multiply and a divide PER symbol. Although this has been mitigated in contemporary algorithms by using bit shifts, adders, and lazy resyncing. Still, it's an order of magnitude slower than stream->writeInt( a, 10 ); Also, switching to arithmetic coding would require changing all the current network code (ALL the network code) to use the symbol/model system instead of a bit-by-bit stream.
It may be possible to build a hybrid system which allows arithmetic coding to co-exist with the standard packet format, but such a scheme requires a lot more digging and modeling than I've done so far.
---------------------
So there you have it. 36% + 10% + (unknown factor)% ~= 50% compression of the network traffic!
I'm curious to know if people are interested in testing out these ideas, and what the GG folks think (since they know the code better than I).
--Bryan
The way I judged this was to log every packet from a live Realm Wars server, and plot the location of each player at each packet. I also looked at what was filling up the packet data.
In a 5-7 person game:
- Packets contain ghost update data from 150 - 200 bytes.
- 91% of the ghost update size is filled with player update data (position, velocity, orientation, moves).
- Nearly 1/3 of that (31%) is the position vector.
So my ideas mostly concern themselves with compressing the position vectors, and player updates.
---------------------
Idea #1 - Pass LOD parameter to packUpdate().
In the networking code, there is no concept of Level of Detail. Whether you are in the same orc hut with someone, or they are on the other side of the map, you still get *precise* coordinates for their location. Some amount of range compression is performed such that not all 32 bits of float are sent for close opponents.
If you viewed the amount of bits sent as static, and the 'shift' of the bits as dynamic, then you could always send 10 bits of coordinates per dimension. For close opponents the bits would consist of mostly fractional digits for precise rendering. For distant opponents, the bits would represent 10's or 100's of units since most of the precision is already lost at render time. (Distance is not the best metric here, since you might be looking through a scope, but it's good for an example).
Other elements in the update structure can also be left out or scaled down using the LOD. For instance, a sufficiently distant opponent's orientation is irrelevant. That's 33 bits per player update per packet.
There is already a parameter 'scale' in the writeCompressed() function which performs this shifting. However, it is not being used at the moment. Was this the intended purpose of it?
In the protocol, I imagine LOD would be a range 0..7 (3 bits) and be passed as the first parameter of a ghost update after its index. By my crude calculations, this step could achieve a maximum 36% compression of the player position/orientation/velocity vectors (103 bits -> ~66 bits).
---------------------
Idea #2 - More compression points.
The network engine has the concept of a 'compression point', to which it compares all player update vectors. If they are sufficiently close, it encodes them in fewer bits.
Why not extend this to multiple compression points? An initial set of compression points could come with the map, and as fire fights break out, the server could send a small update specifying where the new compression point is. By keeping a set of these points, multiple groups of players clustered together get their positions compressed separately.
Server side, this means sorting the player objects by distance to a compression point, then sending them in a particular order (index of compression point, followed by all player updates using that point, then index of next point, etc..).
This idea requires a lot more changes than the previous idea, and I'm not completely convinced if there would be additional payoff if the first idea was also implemented. But alone, this scheme could shave up to 15% of the total ghost update size (by cutting the average position vector size almost in half).
---------------------
Idea #3 - Range compression of the Z coordinate
For many games, the Z coordinate will be highly compressed in its range over any particular area of the map. This translates to huge savings in position vector size (33% max). Taking advantage of this savings requires some special encoding schemes. Arithmetic Coding is typically used for this purpose. Basically you encode the Z coordinate as a probability. With high probability of being close to the ground, it is possible to encode the average-case values in about 5 bits (as opposed to the 16-32 bits used now). You pay for this super-compression on the other end of the scale.. it may take MORE than 32 bits to specify an abnormally high or low Z coordinate.
For most walking/driving games, this is not such a bad trade off (and it stacks well with the other two ideas). This savings (by itself) can reduce the ghost update size by about 10%.
---------------------
Idea #4 - Arithmetic coding for lots of other stuff...
And finally, the big-daddy idea, which has a lot of pros and also a lot of cons. Switching the entire protocol over to an arithmetic encoding could significantly reduce the average-case packet size. This is based on my data mining while playing Realm Wars with 5-7 active players.
A quick example: Each packet contains a 'true' bit for each ghost update, and a single 'false' bit at the end of the channel. The ratio of true to false for these bits is (on average) 9:1. Arithmetic coding allows you to encode that ratio in (exactly) 1/9 of a bit! This is an immediate savings of 1 byte per packet.
There are numerous such bits scattered through the packets. They are easily modeled by playing a few games, and the models can even be updated dynamically (although this adds overhead to the protocol).
The benefits of this scheme become even more apparent when the number of objects being tracked increases, since most of the modeled bits are per-ghost or per-event.
So what's the downside to this silver bullet? Arithmetic Coding requires a multiply and a divide PER symbol. Although this has been mitigated in contemporary algorithms by using bit shifts, adders, and lazy resyncing. Still, it's an order of magnitude slower than stream->writeInt( a, 10 ); Also, switching to arithmetic coding would require changing all the current network code (ALL the network code) to use the symbol/model system instead of a bit-by-bit stream.
It may be possible to build a hybrid system which allows arithmetic coding to co-exist with the standard packet format, but such a scheme requires a lot more digging and modeling than I've done so far.
---------------------
So there you have it. 36% + 10% + (unknown factor)% ~= 50% compression of the network traffic!
I'm curious to know if people are interested in testing out these ideas, and what the GG folks think (since they know the code better than I).
--Bryan
#2
I have not dug too deeply here as I just found this last night, but it suprised me and I'm curious why it is in there at all? Since the server updates us on the other players positions/velocities, it seems redundant to also send their moves.
More to come..
--Bryan
03/29/2002 (9:30 am)
I also noticed that the entire move structure was sent for each player update... is this the *other* players moves?I have not dug too deeply here as I just found this last night, but it suprised me and I'm curious why it is in there at all? Since the server updates us on the other players positions/velocities, it seems redundant to also send their moves.
More to come..
--Bryan
#3
03/29/2002 (9:39 am)
The move is used to "predict" player actions (including motion) on the client. A player's velocity is not enough, and his velocity might not match his current move, especially when players are dodging around. There may be some savings there though :)
#4
I have some ideas on implementing extra compression points in O(n) time which would also require modifying the network priority calculations, but it's looking like a bigger and bigger deal for a not-so-large compression overall. 80-20 is still the order of the day!
--Bryan
03/29/2002 (11:01 am)
Tim, thanks for your responses :) I'll look a bit more deeply into LOD and Z-Compression.I have some ideas on implementing extra compression points in O(n) time which would also require modifying the network priority calculations, but it's looking like a bigger and bigger deal for a not-so-large compression overall. 80-20 is still the order of the day!
--Bryan
#5
I'm just wondering what exactly your hoping to achieve by doing so.
If its support *more* players, then technically I think shaving bytes off the protocol isnt really helping. You need something more fundamental.
I always get a chill when people start messing with systems that are known to work. In order to "improve" them, youve got to have gone through the same hoops to know why a protocol ended up as it did.
There are always optimisations to be had with any code, but this is usually best done algorithmically, rather than trying to optimise the micro-code.
Phil.
03/29/2002 (11:26 am)
Just wondering about all this Bryan, WHY exactly are you looking at shaving bytes of packet size?I'm just wondering what exactly your hoping to achieve by doing so.
If its support *more* players, then technically I think shaving bytes off the protocol isnt really helping. You need something more fundamental.
I always get a chill when people start messing with systems that are known to work. In order to "improve" them, youve got to have gone through the same hoops to know why a protocol ended up as it did.
There are always optimisations to be had with any code, but this is usually best done algorithmically, rather than trying to optimise the micro-code.
Phil.
#6
Actually that's not always true. In a compute-bound system, an algorithmic improvement (say, O(n) instead of O(n^2)) is clearly at an advantage. No amount of assembly code is going to get O(n^2) < O(n) for sufficient values of 'n'.
This is not such a case.
Tribes 2/Torque is a purely bandwidth-bound application. As such, I'm examining techniques which are better-than-constant improvements in the network bandwidth.
That is, I'm not shaving off a byte per packet.. I'm shaving off a byte per packet PER CLIENT PER OBJECT. That's O(3n) savings for any bit I slice off.
The algorithm used in Torque for sending updates is near-optimal, since it uses white-box tracking of stale updates across each client (vastly better than Quake III's dumb-render with prediction model - how many 64 player Quake games have you played?). But this algorithm is not encoding the updates themselves in an optimal manner. My research is looking purely into the encoding strategy for the algorithm, and as such is very un-intrusive to the overall working system.
While Torque says it supports 128 players, it is essentially impossible to support this over current broadband connections (roughly, that's 28 bypes per player update * 128 player objects = 3.6K per full update * 128 clients = 468K per update * 20Hz = 1Mb per second pumping out of the server.. mind you, Torque starts chopping off data when a full update can't fit in a packet).
Anyway, hope that helps make sense of my bit-fiddling. :)
--Bryan
03/29/2002 (12:50 pm)
Phil,Actually that's not always true. In a compute-bound system, an algorithmic improvement (say, O(n) instead of O(n^2)) is clearly at an advantage. No amount of assembly code is going to get O(n^2) < O(n) for sufficient values of 'n'.
This is not such a case.
Tribes 2/Torque is a purely bandwidth-bound application. As such, I'm examining techniques which are better-than-constant improvements in the network bandwidth.
That is, I'm not shaving off a byte per packet.. I'm shaving off a byte per packet PER CLIENT PER OBJECT. That's O(3n) savings for any bit I slice off.
The algorithm used in Torque for sending updates is near-optimal, since it uses white-box tracking of stale updates across each client (vastly better than Quake III's dumb-render with prediction model - how many 64 player Quake games have you played?). But this algorithm is not encoding the updates themselves in an optimal manner. My research is looking purely into the encoding strategy for the algorithm, and as such is very un-intrusive to the overall working system.
While Torque says it supports 128 players, it is essentially impossible to support this over current broadband connections (roughly, that's 28 bypes per player update * 128 player objects = 3.6K per full update * 128 clients = 468K per update * 20Hz = 1Mb per second pumping out of the server.. mind you, Torque starts chopping off data when a full update can't fit in a packet).
Anyway, hope that helps make sense of my bit-fiddling. :)
--Bryan
#7
> systems that are known to work. In order to "improve"
> them, youve got to have gone through the same hoops to
> know why a protocol ended up as it did.
Oh, I forgot to mention. I *have* jumped through the same hoops. You can download a fully-working open-source clone of the Tribes networking model off my website (www.fractalscape.org/NetworkingEngine).
--Bryan
03/29/2002 (12:58 pm)
> I always get a chill when people start messing with> systems that are known to work. In order to "improve"
> them, youve got to have gone through the same hoops to
> know why a protocol ended up as it did.
Oh, I forgot to mention. I *have* jumped through the same hoops. You can download a fully-working open-source clone of the Tribes networking model off my website (www.fractalscape.org/NetworkingEngine).
--Bryan
#8
Anyway... reducing the per object packet overhead doesn't get you more objects in the scene (or on the server), but basically increases the average frequency with which each object is updated. This will of course improve their behavior on the client. Another important part of this equation is the update prioritization calculations.
Hey Bryan, I was up on your site, but I couldn't find where to download your network engine... was thinking of checking it out :)
03/29/2002 (1:51 pm)
As I'm sure Bryan knows, it was never the intention of sending an update for every object in every client update packet. That's the whole point of prioritizing the updates. I've been in 50 player T1 games (all visible at the same time), and T2 has had games with over 100 players. Player counts don't even include projectiles, vehicles, and items which easily doubles the number of ghostable objects. I don't see any problems with having 128 player (or more) Torque games. And that's running at 2K per second per client for a total of about 256K/sec bandwidth on the server (not including packet overhead size).Anyway... reducing the per object packet overhead doesn't get you more objects in the scene (or on the server), but basically increases the average frequency with which each object is updated. This will of course improve their behavior on the client. Another important part of this equation is the update prioritization calculations.
Hey Bryan, I was up on your site, but I couldn't find where to download your network engine... was thinking of checking it out :)
#9
www.fractalscape.org/GameDemoSrc07-21-01.zip
And the necessary textures:
www.fractalscape.org/DemoTextures05-26-01.zip
And my work log on building the demo:
www.fractalscape.org/TerrainRenderingWorkLog
It's part of my 'game demo' that I was building. Unfortunately the 'demo' part is just plain broke, but the network code is complete. For a working demo, you'll have to get the previous version (www.fractalscape.org/DemoSource05-26-01.zip).
--Bryan
03/29/2002 (7:00 pm)
Sure, the direct link is:www.fractalscape.org/GameDemoSrc07-21-01.zip
And the necessary textures:
www.fractalscape.org/DemoTextures05-26-01.zip
And my work log on building the demo:
www.fractalscape.org/TerrainRenderingWorkLog
It's part of my 'game demo' that I was building. Unfortunately the 'demo' part is just plain broke, but the network code is complete. For a working demo, you'll have to get the previous version (www.fractalscape.org/DemoSource05-26-01.zip).
--Bryan
#10
Vs culling, you can also send reduced-accuracy (and thus reduced-bits) pos/ypr/etc. data for distant or otherwise non-visible (occluded, behind, etc.) objects in the scene to a given client. Track an epsilon for how 'far from reality' the data has gotten and only send 'accurate' data when it's getting off too far.
Actually, the same sort of epsilon-tracking thing works well with predictive/interpolated/extrapolated client algs. The server 'knows' what the client will do with an object if the server doesn't send an update; if the epsilon/delta between the 'real' data and what the client will 'posit' remains small enough, don't bother sending anything whatsoever. When it starts creeping to high for comfort, send the update packet needed.
For all I know, this kind of logic is already in the system (yeah, yeah, I know about the doc on the net system -- I mean to read it last May, and still haven't found the cycles!).
-d
03/29/2002 (9:19 pm)
I assume the net engine/server is doing distance & directional update culling when possible? I haven't gotten deep into that section of the codebase, but spent a lot of time with two other engineers developing the Dark Vengeance (and post-DV-overhauled) net code. Seems that completely dropping packets on the floor for clients that really don't need them is useful.Vs culling, you can also send reduced-accuracy (and thus reduced-bits) pos/ypr/etc. data for distant or otherwise non-visible (occluded, behind, etc.) objects in the scene to a given client. Track an epsilon for how 'far from reality' the data has gotten and only send 'accurate' data when it's getting off too far.
Actually, the same sort of epsilon-tracking thing works well with predictive/interpolated/extrapolated client algs. The server 'knows' what the client will do with an object if the server doesn't send an update; if the epsilon/delta between the 'real' data and what the client will 'posit' remains small enough, don't bother sending anything whatsoever. When it starts creeping to high for comfort, send the update packet needed.
For all I know, this kind of logic is already in the system (yeah, yeah, I know about the doc on the net system -- I mean to read it last May, and still haven't found the cycles!).
-d
Torque Owner Tim Gift
2. More compression points. Having mulitple "anchor" points is something we've talked about before, though we never came up with something we were all happy with. Ordering the players by anchor point is not so simple, objects are ordered by network priority, and since we don't know exactly how many objects will fit in a packet its hard to change that order. We'd talked about using a grid as well, sort of like a b-tree encoding to select the anchor and an offset.
3. Z compression. We should definitely have this one as an option :)
4. Arithmetic encoding. Maybe in the torque 2 engine ;)
Squeezing down network traffic is always a good idea. These are all worth exploring.