A TorqueScript (pre)compiler - anybody interested?
by Daniel Buckmaster · in Torque 3D Professional · 11/09/2013 (11:58 pm) · 34 replies
I've been thinking about this for some time, and it's been bugging me enough that I figured I'd step out and gauge interest. What I want to do is implement a preprocessor for TorqueScript. Not like the C preprocessor, but nearly a full-on compiler that parses your scripts, does any useful steps in the middle, and outputs the actual TorqueScript that your game runs.
Some 'preprocessing' steps I would want to implement are:
Optimisations. For example, detecting when a local variable is constant (i.e. it's assigned a value only once) and replacing further occurrences of it with its value. Should save some name lookups. Or another example - pulling complex expressions out of for loops and such. This is a bit of a tricky one, but consider this:
Syntax sugar. For example, plain-language operators like 'and' and 'or' instead of '&&' and '||'. Also, maybe, removing the % prefix for local variables to make code that little bit more modern/readable. Maybe removing the requirement for semicolons. It's all up for grabs.
String construction. Since strings are so important in TS, I'd love to implement new ways of putting them together. One way to do this is vector notation:
Destructuring assignment. This is a huge one that I'd love to try. Before I explain, here's a code example:
Inline/anonymous functions. Like:
So, who's interested in the possibilities? I'd love to try this just from the point of view of implementing my own language/compiler, and because I think TorqueScript lacks some usability.
EDIT: Two more ideas I forgot
Infix function calls. Vector maths is a bit painful, thanks to the way we need to use global functions for everything. But what if there's another way:
A module system.
You know, where names aren't available to you unless you've imported the appropriate model. The compiler would then need to do some wizardry to determine which files need to exec which others, but hey, with modern science I'm sure we can come up with something.
Some 'preprocessing' steps I would want to implement are:
Optimisations. For example, detecting when a local variable is constant (i.e. it's assigned a value only once) and replacing further occurrences of it with its value. Should save some name lookups. Or another example - pulling complex expressions out of for loops and such. This is a bit of a tricky one, but consider this:
for(%i = 0; %i < getWordCount(%string); %i++)getWordCount is called every time the condition is checked. It's more performant to write this:
%len = getWordCount(%string); for(%i = 0; %i < %len; %i++)But why do that when it should be the compiler's job? Of course, it can't always be like this, particularly if %string can change length over the course of the loop. Which the compiler would have to check. It also doesn't work if the function called (in this case getWordCount) isn't pure, and may return a different result given the same input. But 90% of the time this holds. You could go all sorts of directions with this, replacing constants doesn't even scratch the surface of the possibilities.
Syntax sugar. For example, plain-language operators like 'and' and 'or' instead of '&&' and '||'. Also, maybe, removing the % prefix for local variables to make code that little bit more modern/readable. Maybe removing the requirement for semicolons. It's all up for grabs.
String construction. Since strings are so important in TS, I'd love to implement new ways of putting them together. One way to do this is vector notation:
%pos = VectorAdd([1, 2, 3], [4, %y, %z]);
// Which compiles to
%pos = VectorAdd("1 2 3", "4" SPC %y SPC %z);Another way is string interpolation:%greeting = "Hello, @{%name}!";
// Which compiles to
%greeting = "Hello, " @ %name @ "!";Destructuring assignment. This is a huge one that I'd love to try. Before I explain, here's a code example:
%data = ContainerRayCast(...);
%hitObj = getWord(%data, 0);
if(%hitObj) {
%hitPos = getWords(%data, 1, 3);
%hitNormal = getWords(%data, 4, 6);
}I haven't even had to write that snippet many times, but it gets boring. And returning multiple results as words in a string is a common pattern in TorqueScript. So it'd be much nicer to write something like:[%hitObj, %hitPos, %hitNormal] = ContainerRayCast(...);using the vector notation above to pull groups out of the word-delimited string. Note that this doesn't work entirely as given, because the second two parameters take 3 words each. So in a basic implementation you'd be looking at
[%hitObj, %hx, %hy, %hz, %nx, %ny, %nz] = ContainerRayCast(...);Which isn't awful, but could definitely be improved. Maybe like this:
[%hitObj, %hitPos 3, %hitNormal 3] = ...providing word lengths for sequences that are > 1 word. There should also be syntax for destructuring records (newlines) and fields (tabs):
function giveMeData() {
return "word word" TAB "tab" NL "newline";
// Or,
return ["word", "word": "tab"; "newline"];
}
[$a, $b: $c; $d] = giveMeData();Inline/anonymous functions. Like:
schedule(1000, function() {
echo("Hi.");
});This is a pretty easy transformation to make naively - just pull the function out into the global namespace and replace its use with an auto-generated name - but there are tricky things to implement like how it handles scope and binding. On the other hand, it'd be very easy to not even try to match other languages on that, since TorqueScript doesn't do that anyway.So, who's interested in the possibilities? I'd love to try this just from the point of view of implementing my own language/compiler, and because I think TorqueScript lacks some usability.
EDIT: Two more ideas I forgot
Infix function calls. Vector maths is a bit painful, thanks to the way we need to use global functions for everything. But what if there's another way:
VectorAdd(VectorAdd(%a, %b), %c); // becomes %a ^vectorAdd %b ^vectorAdd %c;I'd propose using a backslash (it got eaten by the code block, so I used a caret instead) or something to allow functions of two arguments to be called as infix operators to make code read more nicely.
A module system.
You know, where names aren't available to you unless you've imported the appropriate model. The compiler would then need to do some wizardry to determine which files need to exec which others, but hey, with modern science I'm sure we can come up with something.
About the author
Studying mechatronic engineering and computer science at the University of Sydney. Game development is probably my most time-consuming hobby!
#2
11/10/2013 (7:40 am)
Unless you want to write your own language, you would most likely be better off just using python or lua. They both would have the benefit of people knowing them.
#3
Tim, I agree, but replacing the entire console is a ton of work anyway. ScriptT3D requires Python be installed on the target machine, and I'm not a fan of C#, or I'd probably go with one of the existing solutions.
EDIT: To be a bit clearer, since I'm not typing on my phone now, I'd propose a separate stand-alone compiler that would preprocess your source files (written in TorqueScript++ or whatever) into TorqueScript files which are then compiled into DSOs in the usual way when the engine starts up and they are execed. All the syntactic sugar I describe would be implemented in the external compiler and converted into regular old TorqueScript. It's exactly analogous to languages that compile to JavaScript. In fact I take a lot of inspiration from CoffeeScript.
I guess it's a question of what type of work you'd rather do. For me, implementing a compiler from scratch sounds like a great project to sink my teeth into. Ripping the console out of Torque would be equally challenging, but doesn't sound like as much fun. While replacing the console wholesale sounds like a better long-term solution (it's going to have to happen eventually), just making a wrapper over TorqueScript seems like the path of least resistance right now. It may even be possible to then simplify TorqueScript, or at least determine some highly-performant subset of it to use as a compiler target, like asm.js.
Of course, yes, I'd rather be writing my game ;P. But writing a compiler is a fun learning experience in its own right!
11/10/2013 (1:33 pm)
James, those are all reasons why I want to compile to TS, rather than touch the console. I've tried changing the grammar before. It wasn't pleasant! So it seems more productive to create a separate preprocessor in a language like Haskell (which is well-kown for having good facilities to implelent compilers, DSLs, etc) and sidestep it altogether.Tim, I agree, but replacing the entire console is a ton of work anyway. ScriptT3D requires Python be installed on the target machine, and I'm not a fan of C#, or I'd probably go with one of the existing solutions.
EDIT: To be a bit clearer, since I'm not typing on my phone now, I'd propose a separate stand-alone compiler that would preprocess your source files (written in TorqueScript++ or whatever) into TorqueScript files which are then compiled into DSOs in the usual way when the engine starts up and they are execed. All the syntactic sugar I describe would be implemented in the external compiler and converted into regular old TorqueScript. It's exactly analogous to languages that compile to JavaScript. In fact I take a lot of inspiration from CoffeeScript.
I guess it's a question of what type of work you'd rather do. For me, implementing a compiler from scratch sounds like a great project to sink my teeth into. Ripping the console out of Torque would be equally challenging, but doesn't sound like as much fun. While replacing the console wholesale sounds like a better long-term solution (it's going to have to happen eventually), just making a wrapper over TorqueScript seems like the path of least resistance right now. It may even be possible to then simplify TorqueScript, or at least determine some highly-performant subset of it to use as a compiler target, like asm.js.
Of course, yes, I'd rather be writing my game ;P. But writing a compiler is a fun learning experience in its own right!
#4
11/11/2013 (6:42 am)
Actually, here's an interesting (and even more ambitious) idea - write a translator that takes script in your favorite language and spits out Torquescript.
#5
Edit:
Minions of Mirth I believe uses Py2Exe or something similar to package its exe BTW. So if you want to see an example check that out.
11/11/2013 (8:17 am)
Quote:ScriptT3D requires Python be installed on the targetNeither Python nor any of its libraries are required to be installed on the target machine for a packaged project. Research the Python packaging programs such as Py2Exe. I use this with my commercial Python projects to get a exe to the client that they just run. It will add approximately 10MB of additional files depending upon on the number libraries used.
Edit:
Minions of Mirth I believe uses Py2Exe or something similar to package its exe BTW. So if you want to see an example check that out.
#6
Frank, thanks for the tip. I had the impression that stuff like Py2Exe was a bit janky, bit it sounds worth checking out if it's working for you. The other downside with ScriptT3D, of course, is that some stuff goes through the console anyway - but that's obviously no worse than my proposal. I really should get back to playing around with it!
One thing that may come of writing this parser for TS is that we could develop ways to automatically convert TS to another language, when/if one becomes accepted for use with Torque.
11/11/2013 (12:12 pm)
Richard - might as well go the whole hog and write an LLVM backend for TorqueScript. Which... yeah. Awesome but insane.Frank, thanks for the tip. I had the impression that stuff like Py2Exe was a bit janky, bit it sounds worth checking out if it's working for you. The other downside with ScriptT3D, of course, is that some stuff goes through the console anyway - but that's obviously no worse than my proposal. I really should get back to playing around with it!
One thing that may come of writing this parser for TS is that we could develop ways to automatically convert TS to another language, when/if one becomes accepted for use with Torque.
#7
Yes, ScriptT3D is pretty much dependent upon the console system for some things. I am creating a list of things I am planning on changing as I work with it more. For one is I am going to make all SimObjects created and instantiated through a C++ callback rather than relying on the scripting system.
11/11/2013 (1:40 pm)
@dB,Yes, ScriptT3D is pretty much dependent upon the console system for some things. I am creating a list of things I am planning on changing as I work with it more. For one is I am going to make all SimObjects created and instantiated through a C++ callback rather than relying on the scripting system.
#8
11/12/2013 (9:55 am)
Wouldn't investing time in Lua be better time spent? There is a lot of "Console" overhead that should could be stripped out at runtime. This can be done with wrapper classes and less member variables being declared on base classes. This is just my opinion. :) I could be wrong.
#9
If you want a really fast scripting language you cannot beat DotNetTorque
11/12/2013 (2:31 pm)
You can use memoization (ScriptT3D does) to reduce console dependency overhead. ScriptT3D was designed for feature richness and more tools rather than the fastest bestest thing that could be implemented. A complete rewrite of the console would really be a good thing, but it is a big task. If you want a really fast scripting language you cannot beat DotNetTorque
#10
Thanks for the compliment about DNT!
The only thing faster than DNT is OMNI. Which if the winds are to my back and the sun shines everyday, an EA should be out 1st Q 2014.
In Omni, I pretty much disposed of the console completely. I also cleaned up some of the more wordy parts of the C# syntax.
Then you throw the mono port 64bit port, etc on top I really think this release is going to shine!
But enough of me hijacking this thread :)
@Daniel,
If you believe in something build it! You never know how far you will get in life unless you try. I remember reading somewhere that the difference between inventors and dreamers, is that the inventor's gave it a shot.
So give it H#LL and see it through, I'd be even willing to help you out if you get stuck on something.
Vince
11/12/2013 (5:32 pm)
@Frank,Thanks for the compliment about DNT!
The only thing faster than DNT is OMNI. Which if the winds are to my back and the sun shines everyday, an EA should be out 1st Q 2014.
In Omni, I pretty much disposed of the console completely. I also cleaned up some of the more wordy parts of the C# syntax.
Then you throw the mono port 64bit port, etc on top I really think this release is going to shine!
But enough of me hijacking this thread :)
@Daniel,
If you believe in something build it! You never know how far you will get in life unless you try. I remember reading somewhere that the difference between inventors and dreamers, is that the inventor's gave it a shot.
So give it H#LL and see it through, I'd be even willing to help you out if you get stuck on something.
Vince
#11
Another thing I'd prefer before replacing the console is to wait for Jeff's component system. With that, we could vastly pare down the amount of code that relies on the existing console system and focus on the core of it.
Vince, have you guys developed an in-house tool to convert TS, or is it done manually? I could actually see a TS parser/AST manipulator being helpful for migrating away from TorqueScript without having to leave behind all the legacy scripts immediately.
I hear you on the 'inventors vs dreamers', but I'm at a stage where I have a billion dreams, and I have to make hard decisions about which ones to pursue. This one is something I'd love to do out of interest, but I suspect I'll only actually do it if someone other than me would get some use out of it.
...whcih makes Haskell the logical choice of language for sure :P.
11/12/2013 (6:11 pm)
I'd actually target JavaScript before Lua, though I haven't looked in detail at either... Lua sounds like a good language, and if it's significantly easier to embed it might just be worth it.Another thing I'd prefer before replacing the console is to wait for Jeff's component system. With that, we could vastly pare down the amount of code that relies on the existing console system and focus on the core of it.
Vince, have you guys developed an in-house tool to convert TS, or is it done manually? I could actually see a TS parser/AST manipulator being helpful for migrating away from TorqueScript without having to leave behind all the legacy scripts immediately.
I hear you on the 'inventors vs dreamers', but I'm at a stage where I have a billion dreams, and I have to make hard decisions about which ones to pursue. This one is something I'd love to do out of interest, but I suspect I'll only actually do it if someone other than me would get some use out of it.
...whcih makes Haskell the logical choice of language for sure :P.
#12
I had no clue if there would be an interest for DNT when I wrote it. In fact I thought I would be the only person who would use it. I just felt compelled to write it :)
And we've sold a few copies.. You never know.
For generating C# from TorqueScript, the answer is yes/no. We currently have a tool which will read in Gui files and convert them to C#. This saves a great deal of time converting the tools folder over to C# and this tool will be released with the next version.
But for converting raw TorqueScript to CSharp, that is not so easy since OMNI is REALLY object oriented. You no longer use the class, superclass, etc functionality in T3D and instead you use standard C# inheritance, etc.
In fact callbacks from the engine are exposed as overridable functions so you can derive them and override them. You can ask Aswin, it's a total change from the way TorqueScript works. There are no script files, instead there are just objects with properties, events, etc. It's my first attempt at a MVC approach to T3D.
11/12/2013 (6:22 pm)
@DanI had no clue if there would be an interest for DNT when I wrote it. In fact I thought I would be the only person who would use it. I just felt compelled to write it :)
And we've sold a few copies.. You never know.
For generating C# from TorqueScript, the answer is yes/no. We currently have a tool which will read in Gui files and convert them to C#. This saves a great deal of time converting the tools folder over to C# and this tool will be released with the next version.
But for converting raw TorqueScript to CSharp, that is not so easy since OMNI is REALLY object oriented. You no longer use the class, superclass, etc functionality in T3D and instead you use standard C# inheritance, etc.
In fact callbacks from the engine are exposed as overridable functions so you can derive them and override them. You can ask Aswin, it's a total change from the way TorqueScript works. There are no script files, instead there are just objects with properties, events, etc. It's my first attempt at a MVC approach to T3D.
#13
11/12/2013 (7:12 pm)
Did you profile DNT against Lua? I'm very interested to hear the statistics. I've mostly heard that Lua tromps most other scripting languages. DNT sounds very interesting to me.
#14
Okay, that's all the reason I need. Let's see what I can do... slowly...
11/12/2013 (10:03 pm)
Yes, Torquecript does have a very different structure to a lot of languages. Especially the way I use it, I've discovered you can eke a lot of flexibility out of it and those idioms won't necessarily translate to another language directly. I think, though, that it might be beneficial to just have a good TorqueScript AST and some easyw ays to transform it. Even if it, for example, lets eval() calls fall throguh as comments in the output, the result can then be touched up by hand, since you'll need to consider the intent of the code at that point.Okay, that's all the reason I need. Let's see what I can do... slowly...
#16
11/13/2013 (12:23 pm)
Whenever I tell someone I'm into Haskell, they say "Oh, why do you like Pascal?"
#17
11/13/2013 (4:34 pm)
Oh Pascal, my first programming language, where to 'begin' and where to 'end'... I was seduced away from Pascal to the C++. It promised speed and sexiness. Turbo C++ was my second language. Yes C++ did put out as promised, but there was still something special about Pascal. Then as time went on I had other trysts: html, ecma, php, basic, assemblery, and alas there was Python. Where I again fell back in love with programming as Python allowed for the abstraction I sooooooo longed for... <sigh> Over time I forgot about Pascal as Python was my new love. I still see C++ from time to time, but only to make up for any pure speed related code and always reach for Python as my language of choice. In the right hands she (Python) is putty in my hands. Molded to perfection and simplicity. Yes, I am a Python man for ever more.
#18
Don't really need those statistics. It wouldn't surprise me that CSharp scripting would be faster than LUA since the runtime is static and not dynamic. I have been checking out the Winterleaf webpage and OMNI sounds fantastic. Thank you for bringing this to my attention.
11/13/2013 (6:25 pm)
@VINCEDon't really need those statistics. It wouldn't surprise me that CSharp scripting would be faster than LUA since the runtime is static and not dynamic. I have been checking out the Winterleaf webpage and OMNI sounds fantastic. Thank you for bringing this to my attention.
#19
In Omni, everything is a p-invoke and I hijacked the console to shortcut it back into the C# faster.
There is still some string manipulation but it is really scaled back significantly.
I've also added object caching to the C# so that C# is not creating and destroying objects in between each call.
I've been running profiler after profiler against the C# to find places to improve the code and it has been really boiled down to a reformed state.
Finally, I'm planning on making some simple classes for the complex datatypes in c++ for transform, point3F, etc, so I will be able to pass them as actual classes versus strings (Right now that is the only string manipulation)
What I'm getting at, is that it is not the fact that I'm just using C#, but the fact that this product has been going through continued development for the last 4 years and as my team learns new tricks we update the old code base.
11/14/2013 (6:33 am)
@Antony,In Omni, everything is a p-invoke and I hijacked the console to shortcut it back into the C# faster.
There is still some string manipulation but it is really scaled back significantly.
I've also added object caching to the C# so that C# is not creating and destroying objects in between each call.
I've been running profiler after profiler against the C# to find places to improve the code and it has been really boiled down to a reformed state.
Finally, I'm planning on making some simple classes for the complex datatypes in c++ for transform, point3F, etc, so I will be able to pass them as actual classes versus strings (Right now that is the only string manipulation)
What I'm getting at, is that it is not the fact that I'm just using C#, but the fact that this product has been going through continued development for the last 4 years and as my team learns new tricks we update the old code base.
#20
Sounds awesome! When will OMNI be available to the public? And how much will it cost?
11/14/2013 (9:47 am)
@VinceSounds awesome! When will OMNI be available to the public? And how much will it cost?
Associate James Urquhart
As for changing the actual grammar, while in some cases this isn't much of a problem, in others (such as removing semicolons) you can introduce pretty hard to debug parser issues.
From experience with my attempts at optimizing function calls, you're going to have a hard time implementing these changes, and an even harder time finding anyone bothered enough to actually test them. Personally I would strongly advise you to switch to another scripting language if you want better features as otherwise you will likely find you are spending more time battling with cryptic compiler code than actually making something useful such as a game.