Game Development Community

Multi-processor parallel programming in TGE

by Duncan Gray · in Torque Game Engine · 05/10/2007 (12:17 am) · 83 replies

I just tried running 4 threads on a single CPU and I did not get the above predicted clash in the assembly code.
In fact I get no problems at all.

Although I only adapted the updateSkin method for this test, this multiprocess approach can be used to increase performance in collission, physics, animation calculations, AI etc as well

If you want to try the demo, please post here or email me.
#21
05/11/2007 (8:42 am)
Duncan - one character in-scene might not be enough to notice any performance boost from your changes.
try making a mission file which has a grid of say 50 or 100 orcs, and distributing that with your .exe.
#22
05/11/2007 (2:52 pm)
I'm going to post the code for this later today or tomorrow because I can't do much more to it with out a dual core to play on.

It is based on sound concepts in that it pre-creates a thread pool, one per CPU. Computational loops are then circulated through this multi-processor class so that each loop is handled by a different CPU which should result in that loops computation time being divided by the number of CPU's, a significant increase in performance.

The threads, being pre-created, don't add any overhead. But 2 semaphores and a mutex are required to keep things in order and these will add a minute bit of overhead.

For that reason, the class will use a non threaded loop if it detects a single CPU (code to be added later today) which means you can add this to the engine with zero negative impact on single CPU machines but (in theory) significant impact on multi CPU machines.

That's IF you can ensure that the threads run on different CPU's. I used a similar approach on Unix in the past and I know it works really well there, probably the Mac should also behave as expected, it's just Windows seems to want to run the threads on the same CPU.

Perhaps someone out there knows the secret and can contribute.

I need some motivation, you guys want the code??
#23
05/11/2007 (3:07 pm)
It would be awesome to see the code.
#24
05/11/2007 (4:19 pm)
Isn't the problem that your dealing with the windows scheduler? I think that linux (or unix kernel boxes) don't really care about an apps multi-threaded capability, do they? the code is either written (or at least compiled) to take advantage or it isn't.

Windows is a whole 'nother beast. Unless you explicitly tell the os not to handle the threads, it's going to do it the way it's written to. I'm no programmer but these issues come up numerous times with the technicals of graphic arts professionals all the time. One of the big beefs for a long time (post osx) was that mac code had to be re-written to take advantage of multi processors, where windows (post sp2 and maybe even xp release) did not, that threading was handle by the os. For example pre-dual core photoshop is a dog compared to post dual core on the mac, but there really isn't much difference on the pc side. If anything ps7&8 run faster on a dual core windows box then cs2 since they lack some of the newer bells and whistles.
#25
05/11/2007 (4:54 pm)
Yeah I'm not sure on the windows side of things. I've read a lot of complaints on the internet from Windows programmers who can't get the threads to run where they are supposed to on that OS.

When I finish the code later, the guys with Mac's can try it and see if they have better luck.
#26
05/11/2007 (5:12 pm)
Yeah i have a dual-dual core (mac pro) mac if you need a tester :)
#28
05/18/2007 (4:11 pm)
That's exciting !
#29
05/18/2007 (4:45 pm)
I'm seeing about 10-15% processor optimization running the mp version in the starter.racing unmodded. This was run in window mod watching the processor usage while playing.

Standard torque 50-60% usage
MP version 35-45% usage

Not seeing much in framerate improvement really (then again i'm gettting close to 175 fps), but I'd love to test this with an MK/afx modded engine, i think it would really shine the numbers using the advanced shaders and the added datablocks. Plus, if this drops the dedicated server mode overhead that would be dandy. Maybe I'll check that out now...

Eh, didn't help the server at all. didn't think it would
#30
05/18/2007 (5:05 pm)
This is exciting work.
it's be great to try it against a mission with a crapload of AIPlayers.

here's some screenshots from a test mission i use.
elenzil.com/gg/images/ikes1.jpgelenzil.com/gg/images/ikes2.jpgelenzil.com/gg/images/ikes3.jpg
#31
05/18/2007 (5:44 pm)
Haha thats pretty good fps for that many players, but far from playable i'm sure..
#32
05/18/2007 (6:04 pm)
Herm interesting! im going to play with this later, i downloaded your precompile and with the camera over kork looking down i was getting about 38-40% cpu cycles on a amd althlon X2 dual 4200, though i didnt check FPS. deffinitely need to look into adding suport for this for other areas like physics, collisions, etc...

*edit*

ok an update, i downloaded the newest 1.5.2 and installed and ran the precompiled first, and with a sceen with nothing but kork i was getting between 25 - 33% cpu, when i used the threaded exe and went to the same spot i jumped alot more, but went as low as 17% cpu, but it would vary from 17-30% and would obviously go up when AI kork ran by. BUT while testing this i had AIM, Winamp, IE, several folders, and my Mcafee virus scan was running in the back ground.. so eh grain of salt i supose.
#33
05/18/2007 (7:29 pm)
Thanks for the testing guys.

Actually Casey, the pre-compiled binary is still 1.51 based which is slower than 1.52. , so maybe we will get even better figures with this mod in 1.52

Frame-rate is not a good indicator of performance. CPU usage is more accurate in this instance I think.

I am going to move this to 1.5.2 and add some console commands to select single or threaded mode and run a profile test timer in each. That way we can get much more accurate figures of the actual performance boost.
#34
05/18/2007 (7:38 pm)
Yeah, Jason, your server test would not show anything because updateSkin only really gets called by clients prior to rendering.

Collision, physics, AI etc would be good for server mode.
#35
05/18/2007 (11:08 pm)
Here is the latest 1.5.2 Windows version with the two new console commands:

MP_setThreaded();
MP_setSingle();

so that you can compare mode performance without swapping EXE's

.
#36
05/23/2007 (7:12 pm)
Hi Duncan, I just tried this out in our app (the one orion has pictures of)

I've got a laptop with only 1 reported processor (I forced it to 2 threads, just to make sure it was working) things work fine. I notice a speedup when I switch to MP_startSingle() as expected.

I bring this same exe over to my older desktop machine, which has a single older hyperthreading processor, and reports two processors. Here it ran dog slow. switch to startSingle and things are fast again.

anyone else try this out on any single hyperthreading processor? any ideas of what to look for? I do see that the context switching goes through the roof. maybe I'm just running into super cache thrashing on this machine?


I also did this in our own code base, and its possible I botched something in the addition of it. I'll go over things again and make sure I set things up properly, although when debuging it looks like all is working.
#37
05/23/2007 (7:44 pm)
Hi Clint, yeah I see no advantage to running this in threaded mode on single hyperthread CPU. It's still one cpu even though windows reports it as two and I can expect poor performance like that. There are many complaints on the Internet about hyperthreading poor performance, just check the SQL server forums where they recommend turning off hyperthreading with SQL server.

Ideally we need to detect hyperthreading and only create a thread pool according to the actual number of cpu's or cores present and ignore the hyperthread false count.

There probably is some code out there for doing such detection.

Or perhaps create a benchmark method which you can run and time in both single and threaded mode and then select which ever mode performed best in the test. This can be automated at game start and will solve the 1 hyperthreaded cpu problem..... but what if the user has a dual core in hyperthread mode? I hate Intel for ever introducing that 'solution'.
#38
05/23/2007 (7:47 pm)
Oh, I also updated the resource (5 minutes ago) with a slight performance increase change

I also updated the pre-compiled link above with the latest changes
#39
05/24/2007 (8:53 am)
This code here:
//set process affinity to all available CPU's
DWORD processAffinity = 0;
DWORD bits = 0x00000001;
for(int i=0;i < processors; i++)
{
processAffinity = processAffinity | bits;
bits <<= 1;
}
SetProcessAffinityMask(GetCurrentProcess(),processAffinity);

Completely destroys performance on nvidia 7x00 systems here at Max Gaming. It kills a Pentium D setup, and a Dual processor, Dual Core AMD Godly box.

:/
#40
05/24/2007 (9:12 am)
Oh, and I wanted to add, it kills performance before your even in a mission. And even in a blank mission with no SkinMesh or Wheeled vehicles, its a drop from 100 fps to under 10fps. (Just terrain and water and a single interior)

It works fine with the Intel Core Duo laptops with AMD 1400 mobilities.