Game Development Community

URL-escaping bug in HTTPObject::expandPath()

by Orion Elenzil · in Torque Game Engine · 12/04/2006 (3:34 pm) · 3 replies

I'm seeing this in TGE 1.4.
the symptoms are that ascii characters above 127 are not URLEncoded,
and that's bad, m'kay ?

in httpObject.cc,

replace this line:
char c = path[srcIndex++];

with this line:
unsigned char c = path[srcIndex++];



while i'm at it,
here's a handy console function to expose URL-enconding to script.
ConsoleFunction(urlEncode, const char*, 2, 2, "string str")
{
   S32 origLen = dStrlen(argv[1]);
   char* ret   = Con::getReturnBuffer(origLen * 3 + 1);
   char* p     = ret;

   HTTPObject::expandPath(ret, argv[1], origLen * 3 + 1);

   return ret;
}

#1
03/14/2009 (5:59 am)
Thanks Orion. Who would have thought this is still in TGEA 1.7.1. :)
#2
03/15/2009 (8:09 am)
It's a good fix, to be sure, but, just because I'm pedantic, the RFC for URLs specifies the HTTP scheme in US-ASCII, so, technically, there aren't any characters that are valid for HTTP URLs above 127.

http://www.ietf.org/rfc/rfc1738.txt

http://www.columbia.edu/kermit/ascii.html
#3
03/15/2009 (8:19 am)
Yes there are. From the RFC:

Quote:
Within those parts, an octet may be represented by the chararacter which has that octet as its code within the US-ASCII [20] coded character set.

In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

So my understanding is that the particular application decides whether any of the characters above 127 should be encoded or not. In an application such as a game engine, I think it is very well in order to be able to expand the characters that need to be encoded.

But you're right, if this would be a browser, that'd be the way to go probably.. Except that was written in 1994.. UTF-8 spread all the globe around since then (thankfully), and I'm not sure they thought about encoding UTF-8 characters in the query string back in 1994.

Edit: I meant UTF-8 characters that are first broken up into separate bytes.. I'm not sure I'm being very clear about what I mean, sorry about that. :)