Basic Architecture
by Kyle Carter · in General Discussion · 11/04/2003 (3:54 pm) · 8 replies
A rundown of my notes on the basic architecture.
There are three parts to the system architecture: clients, gateway, and peers.
Clients are self-explanatory. These authenticate and connect to the system. They may represent either a user or a server.
Peers are responsible for routing data amongst clients, providing discovery services, and so forth. They are a high efficiency, highly scalable infrastructure for message passing amongst connected entities. About 90% of the system traffic will go through them. They provide IM, chat, buddy list tracking, server discovery.
Gateway is the generic title given to a set of services including:
- Message of the day/News/RSS Feeds
- Forums
- Player stats
- Service status messages
- Non-realtime database access (for instance, payment and e-commerce)
- Account registration
- Autoupdate (preferably through rsync or similar)
- Load balancing of peers (ie, assigning users to specific peers)
- RPC access to the system
The distinction here is that the peers and clients will be written in C++, use Torque's networking code, and be focused on realtime aspects of the system. The gateway services will be based on off-the-shelf technology and be focused on non-realtime aspects of the system.
The advantage here is that it means we can spend a lot of effort (especially on the first release) getting the peers very solid, as that's our main selling point.
As far as actual implementation goes, I would like to implement most of the gateway services on php/mysql, using the RPC interface to pass data in and out of the peernet. Some of the gateway services can be hung off of a lightweight TNL server, like load balancing and authentication.
The peers I think should be done monolithically. Extra services, like chat-channel game logic, should be done in bots.
What are your opinions?
There are three parts to the system architecture: clients, gateway, and peers.
Clients are self-explanatory. These authenticate and connect to the system. They may represent either a user or a server.
Peers are responsible for routing data amongst clients, providing discovery services, and so forth. They are a high efficiency, highly scalable infrastructure for message passing amongst connected entities. About 90% of the system traffic will go through them. They provide IM, chat, buddy list tracking, server discovery.
Gateway is the generic title given to a set of services including:
- Message of the day/News/RSS Feeds
- Forums
- Player stats
- Service status messages
- Non-realtime database access (for instance, payment and e-commerce)
- Account registration
- Autoupdate (preferably through rsync or similar)
- Load balancing of peers (ie, assigning users to specific peers)
- RPC access to the system
The distinction here is that the peers and clients will be written in C++, use Torque's networking code, and be focused on realtime aspects of the system. The gateway services will be based on off-the-shelf technology and be focused on non-realtime aspects of the system.
The advantage here is that it means we can spend a lot of effort (especially on the first release) getting the peers very solid, as that's our main selling point.
As far as actual implementation goes, I would like to implement most of the gateway services on php/mysql, using the RPC interface to pass data in and out of the peernet. Some of the gateway services can be hung off of a lightweight TNL server, like load balancing and authentication.
The peers I think should be done monolithically. Extra services, like chat-channel game logic, should be done in bots.
What are your opinions?
#2
Note that when I'm referring to "gateway" I'm speaking generally about any service that acts outside of the peernet.. The MOTD service, the load balancing service, update service, web gateway, etc. etc. I think I forgot to explain that.
Based on our discussions, I thought that peers only knew about the peers that they were directly connected to, and that the rest of the network was opaque to them, so load balancing could be difficult for them. :)
Let's see... I think I'll start with some "hard" design decisions, and then bring up some softer ones. These are my thoughts on How Things Should Be; if you disagree, we need to have it out now, or else I'll keep falling back on these assumptions. :)
Hard design decision #1: We don't want to implement our own database solution. RDBMSes are not trivial things to implement, and I think that we would be poorly served expending effort in that direction. The implication I take from this is that although we may choose to cache information in the peers, the RDBMS will be the ultimate authority, and that it will store authentication information along with everything else. It follows from this that we will have a copy of MySQL or similar running in the background, providing important information, and that we'll have a data layer that enforces business logic and relational constraints. It would then be logical to expose this data layer over the peernet.
Hard design decision #2: The peernet is a very efficient message-passing backbone, not a good frontline system. The gateway fills this void. Although the auth puzzle solves some problems very nicely, it would still be a good idea to keep the peernet as unadvertised as possible. A single peer going down could take several thousand users off the peernet! Therefore, we must take steps to avoid making peers easily accessible to outsiders. The gateway part of the design acts as a first line of defense in this regard. It allows us to do things like set up a firewall to filter peer-inbound packets by IP - if the gateway hasn't vetted you (and told you which peer to connect to), then you can't get in. That way, the gateway takes the brunt of the internet's hostility, while the peers can focus on what they do best, ie, being scalable.
Hard Design Decision #3: It's an insecure design for the clients to know the IP of every peer. Let the gateway know and worry about which peers are having the most load and where things should be put to make the system balanced. As a further note, I think supporting dynamic changeover from peer to peer for clients is an unnecessary feature at this stage. If a peer goes down, then the clients can all ask the gateway what to do; especially if it's a planned downage, this will be very managable. The gateway should be able to handle practically any number of requests for logins, so even in the case of an unplanned downage, people should be able to get back onto the system within seconds (assuming capacity remains to hold them).
I need to eat. I'll come back to this later. :)
11/05/2003 (2:06 pm)
Can you clarify the "specialized clients" you mention in your first paragraph?Note that when I'm referring to "gateway" I'm speaking generally about any service that acts outside of the peernet.. The MOTD service, the load balancing service, update service, web gateway, etc. etc. I think I forgot to explain that.
Based on our discussions, I thought that peers only knew about the peers that they were directly connected to, and that the rest of the network was opaque to them, so load balancing could be difficult for them. :)
Let's see... I think I'll start with some "hard" design decisions, and then bring up some softer ones. These are my thoughts on How Things Should Be; if you disagree, we need to have it out now, or else I'll keep falling back on these assumptions. :)
Hard design decision #1: We don't want to implement our own database solution. RDBMSes are not trivial things to implement, and I think that we would be poorly served expending effort in that direction. The implication I take from this is that although we may choose to cache information in the peers, the RDBMS will be the ultimate authority, and that it will store authentication information along with everything else. It follows from this that we will have a copy of MySQL or similar running in the background, providing important information, and that we'll have a data layer that enforces business logic and relational constraints. It would then be logical to expose this data layer over the peernet.
Hard design decision #2: The peernet is a very efficient message-passing backbone, not a good frontline system. The gateway fills this void. Although the auth puzzle solves some problems very nicely, it would still be a good idea to keep the peernet as unadvertised as possible. A single peer going down could take several thousand users off the peernet! Therefore, we must take steps to avoid making peers easily accessible to outsiders. The gateway part of the design acts as a first line of defense in this regard. It allows us to do things like set up a firewall to filter peer-inbound packets by IP - if the gateway hasn't vetted you (and told you which peer to connect to), then you can't get in. That way, the gateway takes the brunt of the internet's hostility, while the peers can focus on what they do best, ie, being scalable.
Hard Design Decision #3: It's an insecure design for the clients to know the IP of every peer. Let the gateway know and worry about which peers are having the most load and where things should be put to make the system balanced. As a further note, I think supporting dynamic changeover from peer to peer for clients is an unnecessary feature at this stage. If a peer goes down, then the clients can all ask the gateway what to do; especially if it's a planned downage, this will be very managable. The gateway should be able to handle practically any number of requests for logins, so even in the case of an unplanned downage, people should be able to get back onto the system within seconds (assuming capacity remains to hold them).
I need to eat. I'll come back to this later. :)
#3
To add a question/another point - would it make it easier for us to implement more enterprise scale technology? I'm thinking about J2EE or similar, where clustering, transactions, security frameworks, delivery proof messaging, exposing database as code etc. are given away for "free"? (Only "free" and not _free_ because it comes at the expense of complexity)
I dont know any equivalent frameworks for C++ as such, but its more a question of enterprise frameworks vs roll-you-own, than a C++ vs Java discussion I want to raise.
E.g. I dont want to touch "code our own transaction safe framework" with a 10 foot pole myself - no matter what language.
11/05/2003 (2:46 pm)
Just a quick comment before I go to bed - who would ever think of _not_ implementing a RDBMS in a system like ours? ;-)To add a question/another point - would it make it easier for us to implement more enterprise scale technology? I'm thinking about J2EE or similar, where clustering, transactions, security frameworks, delivery proof messaging, exposing database as code etc. are given away for "free"? (Only "free" and not _free_ because it comes at the expense of complexity)
I dont know any equivalent frameworks for C++ as such, but its more a question of enterprise frameworks vs roll-you-own, than a C++ vs Java discussion I want to raise.
E.g. I dont want to touch "code our own transaction safe framework" with a 10 foot pole myself - no matter what language.
#4
Right, I understand the gateway concept... I was just thinking that basically the client would make an optional update check with an updater server, and then ALL further communication with the Borg would be through its single connection to a peer on the community network.
Each peer knows about all other clients connected to the network, including all the other peers. It doesn't necessarily know the entire network topology - but it knows which of its directly connected peers leads to each client/peer.
Hard Design Decision #1: Doesn't seem all that hard - we should use an already working RDBMS solution, and then, depending on the service, either wire it into the peer or make it work from a specialized client on the network.
Hard design decision #2: I think the peernet should be the frontline system for client connections. I'm not seeing the win in having a seperate gateway server. The peernet can efficiently deal with spurious connection attempts and rapidly discard bogus data packets (ie from unconnected hosts). My initial thought would be that the DNS entry record for the peer network would have the IPs of all the peers, the client would choose one at random or by region, and then the peer could optionally redirect that connection attempt to another peer. Just because we have a gateway doesn't mean that people won't try to attack the peer network anyway, and requiring a specialized firewall that communicates with the gateway and the peer network sounds like a lot more work.
Also, for the initial release I don't want to plan on having the gateway/update server done, especially since the autoupdater could be a big chunk of time to get working properly.
Hard Design Decision #3: Anyone who's attempting to attack the network will already know the IPs of every peer, or at least a reasonable subset. Also, we double the number of targets possible for attack, since any attacker would only have to take down either the gateway OR the peernet in order to make the system unusable.
Here's an article about client puzzles - check it out: citeseer.nj.nec.com/cache/papers/cs/14187/http:zSzzSzwww.tcm.hut.fizSz~pnrzSzpub...
11/05/2003 (3:05 pm)
A specialized client would look like a client on the peer network, so you could send relayed NetEvents to it and get responses back. In this way we could provide named services like "Forums" or "MOTD" that messages could be sent to and back from.Right, I understand the gateway concept... I was just thinking that basically the client would make an optional update check with an updater server, and then ALL further communication with the Borg would be through its single connection to a peer on the community network.
Each peer knows about all other clients connected to the network, including all the other peers. It doesn't necessarily know the entire network topology - but it knows which of its directly connected peers leads to each client/peer.
Hard Design Decision #1: Doesn't seem all that hard - we should use an already working RDBMS solution, and then, depending on the service, either wire it into the peer or make it work from a specialized client on the network.
Hard design decision #2: I think the peernet should be the frontline system for client connections. I'm not seeing the win in having a seperate gateway server. The peernet can efficiently deal with spurious connection attempts and rapidly discard bogus data packets (ie from unconnected hosts). My initial thought would be that the DNS entry record for the peer network would have the IPs of all the peers, the client would choose one at random or by region, and then the peer could optionally redirect that connection attempt to another peer. Just because we have a gateway doesn't mean that people won't try to attack the peer network anyway, and requiring a specialized firewall that communicates with the gateway and the peer network sounds like a lot more work.
Also, for the initial release I don't want to plan on having the gateway/update server done, especially since the autoupdater could be a big chunk of time to get working properly.
Hard Design Decision #3: Anyone who's attempting to attack the network will already know the IPs of every peer, or at least a reasonable subset. Also, we double the number of targets possible for attack, since any attacker would only have to take down either the gateway OR the peernet in order to make the system unusable.
Here's an article about client puzzles - check it out: citeseer.nj.nec.com/cache/papers/cs/14187/http:zSzzSzwww.tcm.hut.fizSz~pnrzSzpub...
#5
MySQL has build in replication, but its quite simple. You can have 1 master and multiple slaves. Meaning you need to keep the slaves read-only and all writes need to go to the master.
The code in the peers for reading then should have something like
if ( pingDB(server1) ) {
connection = new ConnectToDB(server1);
} else if ( pingDB(server2) ) {
connection = new ConnectToDB(server2);
} else {
raiseError();
}
to get a "fail safe" read only connection. Add a random mechanism to the above and you get load balancing.
Last time I checked (~1 year ago) postgresql did not have replication at all. Postgre has more robust features though for transactions and similar.
In general I think that both would be good enough for our uses, as we dont need anything advanced in the databases except transactions and simple select/inserts/updates
I dont have any other experience with open source / free databases than those 2. And I bet GG doesnt want to pay for a large Oracle cluster license ;-)
#2
One of those things you wished to be in a room with a white board to discuss. I guess I just have to re-read it all a few times to understand it all - or draw a picture.
Dont we "just" have a 4 tier layered architecture?
Backend: database servers
Middle 1: Core functionality server(s)
Middle 2: Servers providing "Services" to clients - browser lists, authentication, persistent storage, rss feed, update servers
Frontend: load balancers/redirectors/"puzzle validation"
Naturally put in a firewall in front of the frontend as well as between middle2 and middle1 (unless we allow connections from the outside to the core server (which I think is a baaaad thing)
I'll re-read and see if I can draw a little picture in my head and putting names on the entities.
#3
I can see the need for a separate "trusted" backend network. It depends a bit on how we physically implement this system, but a separate 100 mbit network for connecting to the database and maybe also the core "king of the hill" server could be a good thing I think. Even if someone tried to DoS the bandwith on the front, the system could still communicate on the secure net.
But I might get too far down into details now that are not needed at the moment. We just need to secure this system in all ends and as good as we can. Someone _will_ try to hack this.
11/06/2003 (2:54 am)
#1MySQL has build in replication, but its quite simple. You can have 1 master and multiple slaves. Meaning you need to keep the slaves read-only and all writes need to go to the master.
The code in the peers for reading then should have something like
if ( pingDB(server1) ) {
connection = new ConnectToDB(server1);
} else if ( pingDB(server2) ) {
connection = new ConnectToDB(server2);
} else {
raiseError();
}
to get a "fail safe" read only connection. Add a random mechanism to the above and you get load balancing.
Last time I checked (~1 year ago) postgresql did not have replication at all. Postgre has more robust features though for transactions and similar.
In general I think that both would be good enough for our uses, as we dont need anything advanced in the databases except transactions and simple select/inserts/updates
I dont have any other experience with open source / free databases than those 2. And I bet GG doesnt want to pay for a large Oracle cluster license ;-)
#2
One of those things you wished to be in a room with a white board to discuss. I guess I just have to re-read it all a few times to understand it all - or draw a picture.
Dont we "just" have a 4 tier layered architecture?
Backend: database servers
Middle 1: Core functionality server(s)
Middle 2: Servers providing "Services" to clients - browser lists, authentication, persistent storage, rss feed, update servers
Frontend: load balancers/redirectors/"puzzle validation"
Naturally put in a firewall in front of the frontend as well as between middle2 and middle1 (unless we allow connections from the outside to the core server (which I think is a baaaad thing)
I'll re-read and see if I can draw a little picture in my head and putting names on the entities.
#3
I can see the need for a separate "trusted" backend network. It depends a bit on how we physically implement this system, but a separate 100 mbit network for connecting to the database and maybe also the core "king of the hill" server could be a good thing I think. Even if someone tried to DoS the bandwith on the front, the system could still communicate on the secure net.
But I might get too far down into details now that are not needed at the moment. We just need to secure this system in all ends and as good as we can. Someone _will_ try to hack this.
#6
So depending on how robust you want the system to be you could work with two core write DB servers with a mutual replication and then have a small net of read only DB servers replicated from the core.
EDIT:
As a side note, MySQL is being used as the database backend for Dark Age of Camelot.
11/06/2003 (10:53 am)
MySQL also supports bi-directional replication.So depending on how robust you want the system to be you could work with two core write DB servers with a mutual replication and then have a small net of read only DB servers replicated from the core.
EDIT:
As a side note, MySQL is being used as the database backend for Dark Age of Camelot.
#7
While you can set up 2 servers that both run slave/master against each other, there is not global master locking. So 2 clients can update the 2 servers before replication has been performed, leaving you with 2 different databases in the master servers.
This might have changed in 4.1 (alpha still), but until that is changed, I wouldnt recommend using this at all - not even in test systems.
11/06/2003 (11:12 am)
Hmmm - I dont think real 2 way replication is done yet. At least in the 4.0 series last time I checked.While you can set up 2 servers that both run slave/master against each other, there is not global master locking. So 2 clients can update the 2 servers before replication has been performed, leaving you with 2 different databases in the master servers.
This might have changed in 4.1 (alpha still), but until that is changed, I wouldnt recommend using this at all - not even in test systems.
#8
11/06/2003 (11:12 am)
Oh - and hi Harold - great to have you onboard!!!
Associate Mark Frohnmayer
Load balancing: peers know about all the other peers, so they can easily reject a connection (due to too many clients currently connected) and redirect to another peer on the system.
Database access stuff: much easier from the client's perspective to have a simple event based query/response setup than to do web queries to a different server. One major selling point (to me) of the community system is that the client only maintains a single persistent connection to a single server that handles a variety of different requests. Since the peer framework will be set up to pass lots of different kinds of messages, it would be appropriate in my mind to place all of the news/forums/player info/stat queries as special clients on the chat network.
This would relegate the "gateway" server to be more of an updater server - it won't have to know the topology of the community server network- it won't even have to be connected to it.
We could easily code a "bridge" client that turned network events into web server/php queries on the back end for scripting.
Thoughts?