Data Types / Internal Storage / Conversion [SOLVED]
by Keith G Wood · in General Discussion · 03/04/2011 (12:01 am) · 6 replies
Having spent 30+ years with strongly typed languages, I'm feeling uneasy about the non-typed nature of torquescript.
Now it's possible I simply don't need to know the internal details of how things are stored - but I'd feel more comfortable if I had a better visibility of the internals.
For example, given the help files state the range of integers is -2147483648 and 2147483647, I can infer that integers are stored as signed 32-bit. The help file says floats are single precision (from which I infer 32-bit IEEE float format); and strings are UTF-8 (so, in fact, string is the only type where I have found a 100% definitive statement that allows me to understand for certain how it is stored internally).
I'm less certain of the interaction, and conversion between strings & floats (between each other and integers) as the following may help to illustrate....
%variable = "12.34"; // (1) a string containing 5 UTF-8 characters of text (plus terminator byte)
%variable += 100; // (2) this is now 112.34 - but is it now a string or a float?
%array[%variable] // (3) does this reference %array[112]? Would that still apply for 112.99999?
I'm guessing (2) has effected an implied cast from string to float, and %variable is now float?
I'm guessing (3) effects an implied cast from float to int of %variable before it is used as an index. But is this implied cast definately a truncate (i.e. round down to whole integer)? I'm further assuming the actual data type of %variable is not changed - i.e. it retains its non-integer part after this?
(This can all be relevant, because code execution will be quicker the less implied operations that are invoked, and it is possible to suffer resolution & overflow problems if working with a different data type than expected. I come from a world where both of these are very important).
Like I said, maybe I don't need to know - but it would give me a warm glow to know what's really going on under the hood.
Now it's possible I simply don't need to know the internal details of how things are stored - but I'd feel more comfortable if I had a better visibility of the internals.
For example, given the help files state the range of integers is -2147483648 and 2147483647, I can infer that integers are stored as signed 32-bit. The help file says floats are single precision (from which I infer 32-bit IEEE float format); and strings are UTF-8 (so, in fact, string is the only type where I have found a 100% definitive statement that allows me to understand for certain how it is stored internally).
I'm less certain of the interaction, and conversion between strings & floats (between each other and integers) as the following may help to illustrate....
%variable = "12.34"; // (1) a string containing 5 UTF-8 characters of text (plus terminator byte)
%variable += 100; // (2) this is now 112.34 - but is it now a string or a float?
%array[%variable] // (3) does this reference %array[112]? Would that still apply for 112.99999?
I'm guessing (2) has effected an implied cast from string to float, and %variable is now float?
I'm guessing (3) effects an implied cast from float to int of %variable before it is used as an index. But is this implied cast definately a truncate (i.e. round down to whole integer)? I'm further assuming the actual data type of %variable is not changed - i.e. it retains its non-integer part after this?
(This can all be relevant, because code execution will be quicker the less implied operations that are invoked, and it is possible to suffer resolution & overflow problems if working with a different data type than expected. I come from a world where both of these are very important).
Like I said, maybe I don't need to know - but it would give me a warm glow to know what's really going on under the hood.
About the author
The perverse mind behind Bad Taste Software: http://www.badtastesoftware.co.uk
#2
Will start reading up on the ArrayObject - in the meantime, some other questions spring to mind....
Is there a truncate facitily? (e.g 12.34 --> 12)
Given there is a danger to create a large number of variable names, I assume resolving names is achieved via a hashing function for speed - is that correct?
If the answer to the previous question is yes - is there a number of variables above which the speed of the hashing function starts to suffer? (Or put it another way, is there a recommended maximum number of variables to avoid degrading performance).
03/04/2011 (5:23 am)
I'm glad I asked the question, I hadn't anticipated that answer!Will start reading up on the ArrayObject - in the meantime, some other questions spring to mind....
Is there a truncate facitily? (e.g 12.34 --> 12)
Given there is a danger to create a large number of variable names, I assume resolving names is achieved via a hashing function for speed - is that correct?
If the answer to the previous question is yes - is there a number of variables above which the speed of the hashing function starts to suffer? (Or put it another way, is there a recommended maximum number of variables to avoid degrading performance).
#3
You mean for float->integer conversions? This happens automatically during the conversions. If a string is parsed as an int, only the integer part is used and if a variable is set from a float, it will automatically set the int value also (using standard float->int conversion).
Yes, the dictionaries use hash tables.
Personally, I'm not that happy with dynamic dictionaries being used for something that can relatively easily be laid out statically by the compiler but well, it's been like that for ages and it's been done before.
If the hashtable gets crowded, it will be enlarged and rehashed to keep the table from producing lots of misses.
Also, keep in mind that this isn't a single hash table. Each stack level starting with the global execution context has its own table and they get are reused as the stack grows and shrinks.
03/04/2011 (5:34 am)
Quote:Is there a truncate facitily? (e.g 12.34 --> 12)
You mean for float->integer conversions? This happens automatically during the conversions. If a string is parsed as an int, only the integer part is used and if a variable is set from a float, it will automatically set the int value also (using standard float->int conversion).
Quote:Given there is a danger to create a large number of variable names, I assume resolving names is achieved via a hashing function for speed - is that correct?
Yes, the dictionaries use hash tables.
Personally, I'm not that happy with dynamic dictionaries being used for something that can relatively easily be laid out statically by the compiler but well, it's been like that for ages and it's been done before.
Quote:If the answer to the previous question is yes - is there a number of variables above which the speed of the hashing function starts to suffer? (Or put it another way, is there a recommended maximum number of variables to avoid degrading performance).
If the hashtable gets crowded, it will be enlarged and rehashed to keep the table from producing lots of misses.
Also, keep in mind that this isn't a single hash table. Each stack level starting with the global execution context has its own table and they get are reused as the stack grows and shrinks.
#4
Is this possible?
03/04/2011 (5:12 pm)
I have a case where I want to truncate a float to int in line - not as a passed parameter to a function. i.e. I want to achieve the equivalent of the following in C:{
float a = 12.34;
a = (float)(int)a;
/* a now equals 12 */
}Is this possible?
#5
03/04/2011 (6:16 pm)
mFloatLength(%float, #decimal_places); mFloatLength(16.8239523895, 3); 16.823; mCeil(%float);//round up mFloor(%float);//round downAlso check TDN and the Torque API link on this page here
Associate Rene Damm
Additionally, the evaluator has been optimized for dealing with floats and integers by maintaining a dedicated stack for them and by caching values of those two kinds after conversion on the dictionary entries. However, that's only optimization and does not alter the picture as a whole.
So, in your example above, what happens is...
The string is stored in the dictionary of the current frame as is. Note that if you used a numeric literal here, then you'd see the integer/floating-point value stored and the string representation created as/if needed.
The %variable dictionary entry's string value gets converted to a float (BTW, most numeric ops default to float so that even if both %variable and the literal are integer by value, the operation will be in floating-point) and the result will be cached on the entry.
The FP value gets copied to the FP stack, the evaluator does its thing, and then stores the FP value back into the %variable. However, conceptually, it's still a string--the conversion just happens lazily.
TS doesn't have true arrays so this is actually special here. What TS does here is to simply append the result (string!) of the array index expression to the variable/property name, i.e. you get "%array112.34" and that is accessed as a local variable on the dictionary.
If you add "0.1" to %variable and index the "array" again, you will get a different value (none).
//Edit
BTW, the way the whole array index business is handled leads to some real dirty quirks. Whatever is used as an array index is added to the global string table so if you loop through 1000 entries in "%myVariable", you've just created 1000 pointless string table entries.
So, in most cases, using ArrayObject is the better idea.