Game Development Community

Cyrillic text

by genomegames · in Torque 3D Professional · 07/04/2012 (8:54 am) · 13 replies

My problem is an output a Cyrillic text. If I try to output some Cyrillic text in the Torque's gui forms, I get an unreadable text. I searched through the forum and found many themes about the Unicode and the Cyrillic but nothing helped me.

I'll tell you what I have in details. My cod loads a Cyrillic text from the database and scripts. When this text renders in gui elements, it's unreadable. Okay, I told about it already. I use Notepad++ to make a Cyrillic text is normal. For the first conversion I translate a Cyrillic text from the win-1251 coding (Notepad++ calls this coding as ANSI) to the UTF-8, then for the second conversion I translate this text back: from UTF-8 to win-1251. In result: I get unreadable hieroglyphs. When I copy this and paste to scripts or the database, in these places text is unreadable too. But when the game loads this text, it renders correctly. This way is working but I should store double data: in win-1251 coding to read by an admin and in UTF-8 to load by the game.

I tried to use two Windows function: WideCharToMultiByte and MultiByteToWideChar to convert a text "on the fly" (when the engine loads a text data) but these functions don't work. And I couldn't get the true results.

I use Windows 7 on the server and clients.

Does anybody meet this task?

Any help will be great! Thank you.

I'll append this thread when I'll find new information about this subject.

I'm sorry for my misunderstandable English

#1
07/04/2012 (2:50 pm)
All T3D versions are working fine with Unicode text (utf8), including Cyrillic.

Launch the engine, press F10 to open GUI Editor and try to type anything in Russian (like text for button). Save GUI and open it with Notepad++: you should be able to see the text in Russian/Cyrillic.

And, you can't edit it with Torsion - it simply doesn't support Unicode.

For my project I can load the Russian text from GUIs, scripts and even database (and store it). No problems.

But in order to display it correctly in "DOS" console window you need to do some conversion like this:
// At top of file, after all includes:
#ifdef DL_UNICODE_FIX
#include "core/stringBuffer.h"
#endif

..//skipped

void WinConsole::processConsoleLine(const char *consoleLine)
{
   if(winConsoleEnabled)
   {
      inbuf[inpos] = 0;
#ifndef DL_UNICODE_FIX
      if(lineOutput)
         printf("%sn", consoleLine);
      else
         printf("%c%sn%s%s", 'r', consoleLine, Con::getVariable("Con::Prompt"), inbuf);
#else
      StringBuffer textBuffer((UTF8 *) consoleLine);
      char asciiStr[1024];
      const wchar_t * unicodeStr = reinterpret_cast<const wchar_t*>(textBuffer.getPtr());
      WideCharToMultiByte(CP_OEMCP, 0, unicodeStr, -1, asciiStr, 1024, NULL, NULL);

      if(lineOutput)
         printf("%sn", asciiStr);
      else
      {
         printf("%c%sn", 'r', asciiStr);
         printf("%s%s", Con::getVariable("Con::Prompt"), inbuf);
      }
#endif // DL_UNICODE_FIX
   }
}

..//skipped
#2
07/04/2012 (3:11 pm)
The &quot;fixed&quot; version of functions to work with clipboard (so you can copy/paste unicode test from/to console and any GuiTextEditCtrl):
//-----------------------------------------------------------------------------
// Clipboard functions
const char* Platform::getClipboard()
{
   HGLOBAL hGlobal;
   LPCTSTR pGlobal;

   //make sure we can access the clipboard
   if (!IsClipboardFormatAvailable(CF_UNICODETEXT))
      return "";
   if (!OpenClipboard(NULL))
      return "";

   hGlobal = GetClipboardData(CF_UNICODETEXT);
   pGlobal = (LPCTSTR)GlobalLock(hGlobal);
   String buf(pGlobal);
   S32 cbLength = buf.length();
   char  *returnBuf = Con::getReturnBuffer(cbLength + 1);
   strcpy(returnBuf, buf.c_str());
   returnBuf[cbLength] = 0;
   GlobalUnlock(hGlobal);
   CloseClipboard();

   //note - this function never returns NULL
   return returnBuf;
}

//-----------------------------------------------------------------------------
bool Platform::setClipboard(const char *text)
{
   if (!text)
      return false;

   String buf(text);

   //make sure we can access the clipboard
   if (!OpenClipboard(NULL))
      return false;

   S32 cbLength = strlen(text);

   HGLOBAL hGlobal;
   LPCTSTR pGlobal;

   S32 buflen = buf.length();
   S32 totlen = (buflen+1) * sizeof(TCHAR);
   hGlobal = GlobalAlloc(GMEM_DDESHARE, totlen);
   pGlobal = (LPCTSTR)GlobalLock (hGlobal);

   wcscpy((wchar_t*)pGlobal, buf.utf16());

   GlobalUnlock(hGlobal);

   EmptyClipboard();
   HANDLE hdnl = SetClipboardData(CF_UNICODETEXT, hGlobal);
   CloseClipboard();

   return true;
}
#3
07/04/2012 (9:36 pm)
Thank you, bank!
I will try
#4
07/05/2012 (6:50 am)
I tried this bank's code to convert a data retrieved from the database:

StringBuffer textBuffer((UTF8 *) consoleLine);
char asciiStr[1024];
const wchar_t * unicodeStr = reinterpret_cast<const wchar_t*>(textBuffer.getPtr());
WideCharToMultiByte(CP_OEMCP, 0, unicodeStr, -1, asciiStr, 1024, NULL, NULL);

But when I assign a string line (which stored in consoleLine) to textBuffer, all symbols transform to '?'
Until this conversion I look normal Russian letters in the debugger
Where should I dig up?
#5
07/05/2012 (7:04 am)
if I change UTF8 to UTF16 then after assign I get not '?' but hieroglyphs and after next conversion
WideCharToMultiByte(CP_OEMCP, 0, unicodeStr, -1, asciiStr, 1024, NULL, NULL);
I get '?' symbols again...
#6
07/05/2012 (7:09 am)
First, default encoding of the database, tables and text fields. I have it all set to use "utf8_general_ci".

Second, be sure your connection to database uses utf8 charset.
If you use mysqlpp lib, you can do something like this:
// assuming you have it declared as myCon:
myCon * mysqlpp::Connection;

// Now, right after you create connection, you need to set it up properly:
   myCon = new mysqlpp::Connection(false);
   // We want to auto-reconnect
   if(!myCon->set_option(new mysqlpp::ReconnectOption(true)))
      Con::errorf("Can't set option to auto-reconnect!");
   // Force to use utf8 encoding
   if(!myCon->set_option(new mysqlpp::SetCharsetNameOption("utf8")))
      Con::errorf("Can't set CharsetNameOption to utf8!");
   // Now, we connect to DB
   myCon->connect(mDatabase, mHostname, mUsername, mPassword);
When retrieving data from database:
// assuming we have query setup properly
mysqlpp::StoreQueryResult res = query.store(); 
// grab first row.
mysqlpp::Row row = res[0]; 
// take first field
const char* fieldData = row.at(0).c_str(); 
// print the data to the console
Con::printf("Retrieved data from database: %s", fieldData);

I don't use any encoding other than utf8 in our project (like win1251, etc), so no conversion required (except DOS console and clipboard) with this setup.
#7
07/05/2012 (7:24 am)
I use another database - MS SQL Server and another method to connect to
Your post gave me a new direction to dig, thanks
#8
07/08/2012 (1:43 am)
Yep, I have big troubles with it:
SQL Server treats a string as ICS-2 (like UTF-16)
Windows NT kernel treats a string as Unicode (UTF-16 too)
Windows user level treats a string as Win-1251 and UTF-16 too
Torque needs an encoding UTF-8
ADO (which I use to connect to a database) hasn't settings to tuning an encoding for a connection to a database
#9
07/08/2012 (3:35 am)
I think if I could get an Unicode-16 string, I should convert this to UTF-8
#10
07/08/2012 (4:54 am)
Yep, SQL Server 2012 treats a string as UTF-8 but it returns an unreadable text too
#11
07/08/2012 (5:11 am)
Oh, friends, I have done this!
I will tell you soon
#12
07/08/2012 (8:12 am)
The time has come.
When I request a data from the database through ADO, I get a data in variant type. Then I convert variant type data to BSTR format. Then instead of a conversion BSTR format to String type I convert this to wide char. After that, I transform a data from wide char to String. I use last object to pass in the script code to output in gui elements.
It's quite simple additional to my cod, I replace a conversion from BSTR type to String with a conversion from BSTR to wide char and next to String.
This is my simple code of a data transformation:
...
String weaponClass;
...
_variant_t vWeaponClass;//weapon class on Russian
...
vWeaponClass = pRecordset->GetCollect("ClassRusName");//get a data from the database
...
_bstr_t WeaponClass = _bstr_t(vWeaponClass),//convert a data type from variant to BSTR
...
//weaponClass = _com_util::ConvertBSTRToString(WeaponClass);//old cod to transform a text from BSTR to String
const size_t newsize1 = (WeaponClass.length());//new code
			wchar_t* wc = new wchar_t[newsize1];//to convert
			wcscpy(wc, WeaponClass);//BSTR to wide char
			weaponClass = wc;//assign a value of wide char var to String var
...
string weap = objName + ' ' + type + ' ' + weaponClass;//collect the string
Con::executef("UpdateWeaponData", clientID.c_str(), weap.c_str());//pass data to a script code
Torque string type treats with UTF-8 type only. Cause the engine code passes to a script code correct values
#13
07/09/2012 (4:43 am)
This is a small correction to my conversion.
When I use a memory safe function wcscpy_s to convert a data from BSTR to wide char, I get a debug error.
I forgot about an ending zero which included in the string.
Therefore a correct size of the string would be a size of the string + 1.
Thereby your cod of a conversion with a memory safe function would be like this:
const size_t newsize1 = (WeaponClass.length()+1);//+1 - ending zero
			wchar_t* wc = new wchar_t[newsize1];
			wcscpy_s(wc, newsize1, WeaponClass);
			weaponClass = wc;
Now my debugger takes a rest