Game Development Community

Large Parameters & TCPObjects = Crash?

by Robert Fritzen · in Torque 3D Professional · 11/27/2012 (9:00 pm) · 15 replies

I'm not sure how many people have actually dug into my XXZ Resource, but I'm sure if you've been using it that this problem has come up at some point.

When I'm transmitting the data to the server and then getting it back, my game client would crash. I dug through endless lines of my authentication code trying to find the problem, but none existed. So then I looked at the TCPObject, and well, according to an old thread I started back, that's not the problem either.

Earlier tonight, I had one of those moments where you could say the light over my head blinked on. Apparently, you cannot send large string variables to function calls. Especially not TCPObjects.


To test this theory, I re-coded all of my TCPObject functioning, but the crashing still occurred, until I used the following approach:
function TCPLR::onLine(%this, %line) {
   echo(%line);
   if(trim(%line) !$= "") {
      %this.canRead = true;
   }
   if(%this.canRead) {
      if(strstr(%line, "$PGD") != -1) {
         $TCPBuffer["TCPLR"] = $TCPBuffer["TCPLR"] @ "n" @ strReplace(%line, "$PGD", "");
      }
   }
}

Then instead of sending parameters to the function, I just accessed my newly created GVar, and what do you know, no more crashing.

My problem, and question for the night is a very confusing manner. Recovery needs to send a packet of info with the username, and a dual hash string (salt and actual data). It then receives a very large buffer to which I call the certificate write code on. This works, but why then, does my registration code that uses this EXACT SAME APPROACH, and sends 1/3 of the data and gets 1/4 of the data back, crash?

The TCP Code:
function TCP_PerformRegister(%username, %password, %email, %key) {
   if($TCP::AuthServer $= "") {
      error("Attempting to register without a connected server.");
      TestServerList();
      return;
   }
   %TCP = new TCPObject(TCPAR);
   
   %TCP.REGpw = %password;
   %TCP.REGuser = %username;
   %TCP.REGemail = %email;
   %TCP.REGkey = %key;
   
   %TCP.connect($TCP::AuthServer @ ":80");
}

function TCPAR::onConnected(%this) {
   %public_e = $Account::RSAKey::E;
   %public_n = $Account::RSAKey::N;

   %hashpass = PGDHash(%this.REGuser @ %this.REGpw @ "2.71828");
   $Account::Stor_Dec_Hash = PGDHash($Account::RSAKey::D);
   %aes_key = %this.REGpw @ $Account::Stor_Dec_Hash;
   %encrypted = encryptAccountKey(%aes_key, $Account::RSAKey::D);

   if(%encrypted $= "encryptfail") {
      MessageBoxOK("Account Encryption Failure", "PGD has failed to encrypt your account key.nGenerating a new RSA key, please try again.");
      GenerateRSA2048();
      
      %this.regFail = true;
      %this.disconnect();
      
      return;
   }

   $Account::RSAKey::D = %encrypted;
   $Account::RegEmail = %this.REGemail;

   %separator = getRandomSeperator(16);
   %header = assembleHTTP1_1Header($TCP::AuthServer, $TCP::AuthPath, "POST", "PGD: XXZ568 Client", "Content-Type: multipart/form-data; boundary="@%separator@"rn");

   %payload = %this.REGuser TAB %this.REGemail TAB PGDHash(%this.REGkey) TAB %encrypted TAB %public_n TAB %public_e TAB $Account::Stor_Dec_Hash TAB %hashpass;

   %dispo = makeDisposition(%separator, authMode, 2);
   %dispo = %dispo @ makeDisposition(%separator, fullPayload, %payload, 1);
   
   %header = %header @ "Content-Length: " @ strLen(%dispo) @ "rnrn";
   
   %request = %header @ %dispo;
   if($debugMode) {
      echo("REQUEST: n"@%request@"n");
   }
   
   %this.send(%request);
}

function TCPAR::onLine(%this, %line) {
   echo(%line);
   if(trim(%line) !$= "") {
      %this.canRead = true;
   }
   if(%this.canRead) {
      if(strstr(%line, "$PGD") != -1) {
         $TCPBuffer["TCPAR"] = $TCPBuffer["TCPAR"] @ "n" @ strReplace(%line, "$PGD", "");
      }
   }
}

function TCPAR::onDisconnect(%this) {
   if(%this.regFail) {
      %this.delete();
      return;
   }

   ValidateRegistration();
   %this.delete();
}

and the reg. code
function ValidateRegistration() {
   %response = $TCPBuffer["TCPAR"];

   closeMessagePopup();
   %response = stripchars(%response, "$");
   %firstIter = true;
   if(strPos(%response, "n") == 0) {
      %response = getSubStr(%response, 1, strLen(%response));
   }
   while(%firstIter || strPos(%response, "n") != -1) {
      %firstIter = false;
      if(strPos(%response, "n") <= 0) {
         //last one!
         %line = %response;
      }
      else {
         %line = getSubStr(%response, 0, strPos(%response, "n"));
         %response = getSubStr(%response, strPos(%response, "n")+1, strLen(%response));
      }
      switch$(firstWord(%line)) {
         case "BAD_NAME":
            MessageBoxOk("Error", "This server has denied your usernamenPlease check for invalid characters or words.");
         case "NAME_TAKEN":
            MessageBoxOk("Error", "This username is in use.");
         case "NO_EMAIL":
            MessageBoxOK("Error", "Your email field is missing.");
         case "BAD_EMAIL":
            MessageBoxOK("Error", "Your email contains invalid characters, or is invalid.");
         case "NO_KEY":
            MessageBoxOK("Error", "Your account key field is empty.");
         case "INVALID_KEY":
            MessageBoxOK("Error", "Your account key field contains invalid characters.");
         case "INTERNAL_ERROR":
            MessageBoxOK("Error", "The server encountered an internal error, please try again later.");
         case "EMAIL_NOT_BOUND":
            MessageBoxOK("Error", "This account key can not be used with your email address.");
         case "SIGN_FAILURE":
            MessageBoxOK("Error", "The server has failed to generate a signature, please contact support@phantomdev.net.");
         case "SIGN_ERROR":
            MessageBoxOK("Error", "The server has failed to sign your account certificate.");
         case "CERT":
            %certline = strreplace(%response, "CERT ", "");
            %user = getField(%certline, 0);
            %guid = getField(%certline, 1);
            %cert = getField(%certline, 2);
            WriteAccountCertificate(%user, %guid, $Account::RegEmail, $Account::RSAKey::E, $Account::RSAKey::N, %cert, $Account::Stor_Dec_Hash, $Account::RSAKey::D);
            $Account::RSAKey::D = ""; // kill the private exponent field now that it's stored
            //MessagePopup("Registration Successful", "Thank you for registering, you can now login with your new account.", 3000);
            MessageBoxOK("Registration Successful", "Thank you for registering, enjoy PGD Survival!");
            //==============================================
            Canvas.popDialog(AccountRegistrationWindow);
            AccountSelector_DropBox.populate();
         default:
            //ignore!
      }
   }
   $TCPBuffer["TCPAR"] = "";
}

To make things even more crazy sounding, the certificate file even successfully writes before my client crashes... any ideas?

#1
11/29/2012 (3:14 pm)
I'm officially reaching the point of hair pulling here. I've re-written everything for a second time, and it STILL crashes. I have absolutely no idea what is going on here. I've tried pretty much everything I can think of. I have no idea why this is causing crashes, what in the code is leading to the crash, and why it would occur in the first place.

Can someone please have a look at this and point me in the right direction. I really need to get this fixed ASAP as I want to have this project done for a potential late December release, and this is blocking my progress.

http://www.phantomdev.net/staff/phantom139/TCPConnections.cs

All of the code for auth works fine (I have testing this using a for loop to run all of them 1000 times inside the game, no crashes)

Beforehand, the recovery would run fine, now it crashes. Also, it would run one time, but if it failed, it would crash the next time you tried to use it.

I should also note, the game runs TCP_TestServer one time, and it runs fine. I'm not sure if this contributes to anything.
#2
11/29/2012 (5:32 pm)
I would see if you can't trace this and use the VC++ debugger to catch when it actually crashes. I looked at the TCPObject and it has a buffer that grows, but that is for data receive. Also, watch where you use echo it is limited to 4096 chars including the terminating NULL char.

I cannot find anything on any inherent limit on parameters for console commands. I don't know that there is a limit or not. I think debugging and stepping through the script code with Torsion would help, and have the VC++ debugger attached at the same time so you can see where it actually crashes. This will at least give you an idea of what is going wrong.
#3
11/29/2012 (5:43 pm)
The actual problem for me is when I run it under the VC++ debugger, the stupid thing works like there isn't anything wrong with it.

A little glimmer of hope though, I was able to get recover to work one time without crashing (like it did beforehand), I was pushing the wrong variable in my timeout function (should read %buffer, not %response).

I'm also moving to push my writeAccountCertificate method into the engine. I can now write the first account certificate without any crashes, but when I try to write another, kapoof.

I'll re-try registration here shortly (I still think that's going to crash), but the awkward part on that end is the function for it actually runs through, and even writes the full certificate, and then it crashes.... obviously don't want that in the full game.

*edit*
If you have the time, here is my certWrite.cpp file... is there anything in there that's pushing any red flags up?

http://www.phantomdev.net/staff/phantom139/certWrite.cpp
#4
11/30/2012 (1:48 am)
Some things that stick for me:
1 - It might be better to use an XML library to assemble your XML string. Can you use tinyXML for this? It is part of the engine I think.
2 - I don't see any error checking for input parameters. Perhaps that is being done somewhere else? Or maybe Con::getVariable returns an empty string and that is okay, I am not sure.
3 - I don't know if it is safe to type cast (const char *) to (const U8 *). That might be making some assumptions about the data. You may be able to avoid having to worry about that using an XML lib. For a project requiring XML data I using lxml which is a Python wrapper for some really good C based XML/XSL libraries. Even using that lib I still ran into encoding issues if I was not careful.

You say it works better running under the debugger? Are you recompiling or are you running the debug version both times? Having you tried a complete rebuild? Try running separate from VC++ and attach to the running app instead of starting from VC++.
#5
11/30/2012 (1:49 am)
If you are stepping through the code using Torsion you can set a break and stop right before this is called. Then connect to the app using VC++. That way if it is caused beforehand you might be able to force the right conditions to occur.
#6
11/30/2012 (10:30 am)
For #1: My original method used the SimXMLObject's functioning to assemble the XML file, but it would cause access violations when trying to access the string data to save the file. So I switched to the method I used in the past that worked for me when making XML files (the FileObject route).

For #2: I have error testing inside my TS functions that call the code there, I should have stated that beforehand (or provided the code for it).

For #3: I use that type-cast all of the time when I write using the FileObject class. It works perfectly fine so long as the data being sent to it is in the const char * data type. U8 is unsigned char *, and it is perfectly safe to run that type-conversion in C++.

I don't have Torsion, so I cannot use that to do debugging. When I debug, I use both a debug.exe and a release.exe. The debug.exe would run the TCP Code fine, where as the release would not.

However, the debug build did set off the Mutex::unlock assert call. I would assume this would be the same issue that is plaguing the release build, and it appears to be associated with my current certificate write method.

*edit*
In the debug build, I can continuously run the TCPObject code to recover my account (I removed the write certificate call). But when I move over to the release build, the instant I click the button to recover the account, my game crashes (debug mode shows nothing, as it will run perfectly fine in debug mode). At this point, I have absolutely no idea what is causing this, and it is becoming increasingly frustrating to have to deal with this part of my game over and over again.
#7
12/01/2012 (3:50 am)
#3: I did a quick search and you are right partially. ASCII is a subset of UTF8. The pointer conversion is safe, but not necessarily the data conversion. What was concerning me here is not pointer compatibility, but the way the data was interpreted. It may not be safe to go the other way: UTF8 to ASCII. I think UTF8 can contain characters ASCII does not know about. However, if all the methods being used can handle UTF8 then it doesn't really matter and you should be okay. Just make sure you know that the functions being used understand UTF8. I don't know how aware internally that T3D is of UTF8 encoding in their string functions.

Go get a copy of Torsion. I was stubborn about getting it at one point, but when I did get it the time saved in the first couple hours of having it paid for itself. Torsion and VC++ together make a very powerful debugging team for T3D.
#8
12/01/2012 (3:57 am)
Oh, in Python I have to explicitly create a string like this:
string1 = u"This is unicode."
string2 = "This is not."
I don't know what unicode format that puts string1 in, but the two strings representations are not assumed to be compatible. I will have to play with feeding unicode through my Python to T3D interface to see "what happens".

Edit:
I found this link about unicode: docs.python.org/2/howto/unicode.html. It is about Python's implementation, but it has a lot of good info in there. Near the end it has some guidance about writing unicode aware programs.
#9
12/02/2012 (9:08 am)
The function for certificate writing works perfectly as intended (even with the U8 conversion. I only do so because that's the input type accepted by the fileObject method. But enough with that part, the main issue, my TCPObject has had an interesting development.

Putting a delay on the TCP query send once the TCPObject connects to the server appears to have fixed the crashing. I will need to extensively test this further, but for now it looks like this problem might be fixed.

I'm not sure why this would be the case, but I think we need to look into the TCPObjecct more extensively to see if/why this would be happening.
#10
12/03/2012 (5:29 pm)
@Robert,
Can you distill the issue down to a simple script and the conditions it takes to cause the crash? That would make it much easier to reproduce. Does it need to talk to a specific server or can that be simulated as well?

Edit:
Also, are you using the onConnected callback? There are a bunch of callbacks that might help with timing issues. I would not think it would be a good idea to rely on a delay if it can be helped.
#11
12/03/2012 (8:35 pm)
I am using the onConnected callback to send the request ala:

function TCPConnection::onConnected(%this) {
   %this.send(%this.request);  //this is what I have delayed.
}

I will write a simple script with requests of different lengths and types to see if I can reproduce this crash.
#12
12/05/2012 (6:40 pm)
Using a callback should have taken care of timing issues. You definitely are finding a pretty good bug then. I looked at the code behind the TCPObject and it goes into the networking code of the engine enough that I had trouble telling what was going on. I wonder if it has a lot less to do with size as it has to do with timing. Interesting, and good job on figuring this out.

This is kind of similar to the weirdness I saw with doing a dump against a SimObject created in C++ from the console. If I did it fast it would crash. If I did it slow by waiting 5 seconds it would work fine. So some weird timing issue with the actual console widget.
#13
12/20/2012 (7:51 pm)
*Bumping*

TCPObject derives from SimObject, so the problem could be associated there.

I'm hoping this hasn't been buried. Has anyone else had problems with the TCPObject, or any other of the SimObject classes, I really hope that we can find out what is going on to get this issue fixed.
#14
01/03/2013 (5:07 pm)
Sorry for the consistent thread revivals here.

A I pointed out in my latest blog regarding my game release, I have a work around for the TCPObject. I implemented the cURL library into T3D and then had it handle of the TCP transactions. (This is by far no easy task, I had to rename some T3D classes and re-do my crypto++ implementation to get it to work)

From the people who have tested this new system, there have been no crashes so far, so I have a pretty high level of confidence that we are indeed onto a bug here.
#15
01/06/2013 (3:37 pm)
Glad to see you found a solution. Why did you have to rename classes? Couldn't the functions reside in a wrapper or in a namespace?