Multi-processor parallel programming in TGE
by Duncan Gray · in Torque Game Engine · 05/10/2007 (12:17 am) · 83 replies
I just tried running 4 threads on a single CPU and I did not get the above predicted clash in the assembly code.
In fact I get no problems at all.
Although I only adapted the updateSkin method for this test, this multiprocess approach can be used to increase performance in collission, physics, animation calculations, AI etc as well
If you want to try the demo, please post here or email me.
In fact I get no problems at all.
Although I only adapted the updateSkin method for this test, this multiprocess approach can be used to increase performance in collission, physics, animation calculations, AI etc as well
If you want to try the demo, please post here or email me.
About the author
#62
05/28/2007 (6:22 pm)
Here's what I got with my coreduo with lots-a-korks:Parallel Processing Actvated on 2 CPU's Setting single thread modet Benchmark test took 16 milliseconds in single thread mode Parallel Processing activated Benchmark test took 93 milliseconds with 2 threads Best performance [16] was with 1 threads setting thread pool accordingly
#63
Parallel Processing not possible on 1 CPU
Well, I never...!
05/28/2007 (6:25 pm)
BTW with your latest binary, when I try MP_setThreaded(): I get this insulting:Parallel Processing not possible on 1 CPU
Well, I never...!
#64
Presumeably there is a problem with the test?
05/28/2007 (6:28 pm)
Some more data with the latest binary on my coreduo:==>MP_doBenchMark(); Setting single thread modet Benchmark test took 15 milliseconds in single thread mode ==>MP_forceThreads(2); Parallel Processing activated ==>MP_doBenchMark(); Setting single thread modet Benchmark test took 16 milliseconds in single thread mode Parallel Processing activated Benchmark test took 109 milliseconds with 2 threads Best performance [16] was with 1 threads setting thread pool accordingly ==>MP_forceThreads(4); Parallel Processing activated ==>MP_doBenchMark(); Setting single thread modet Benchmark test took 16 milliseconds in single thread mode Parallel Processing activated Benchmark test took 94 milliseconds with 2 threads Benchmark test took 110 milliseconds with 3 threads Benchmark test took 109 milliseconds with 4 threads Best performance [16] was with 1 threads setting thread pool accordingly
Presumeably there is a problem with the test?
#65
You will have to use MP_forcethreads(2)
Could be, we really need a Linux or Mac comparison at some point to figure this out.
05/28/2007 (7:42 pm)
Lol, you got the insulting message because the benchmark test set your cpu count back to one after its performance test.You will have to use MP_forcethreads(2)
Quote:Presumeably there is a problem with the test?
Could be, we really need a Linux or Mac comparison at some point to figure this out.
#66
BTW, whats up with lack of xlibs in ubuntu, i can't even get the ati drivers to load 0_o..
05/28/2007 (8:10 pm)
Well i have unbuntu sadly the d/l link is down and i'm having issues compiling. If anyone can point me in the right direction with compilers that work in ubuntu i'd have the test done. Also, I'll throw this one on the quad mac tomorrow, i could even test it on the quad running windows, but it looks like you already have a quad test.BTW, whats up with lack of xlibs in ubuntu, i can't even get the ati drivers to load 0_o..
#67
You should have xlibs, or you wouldn't be running X windows. Error message?
I had trouble getting the damned closed blob Nvidia driver working on Ubuntu 7.04--it might possibly be a related issue. Currently I'm running Zenwalk linux, because it's boot cd supports my SATA DVD drive.
I'll say this about Zenwalk...I've never had so little trouble getting my display set up. EVERY other distro does some kind of voodoo which ignores what you do to /etc/X11/xorg.conf. And I've played with all the major distros.
Now that I think about it, I never did get the Nvidia closed driver to work on 7.04. It was ok on 6.x....
05/28/2007 (9:34 pm)
@Jason: if you've got some specific error messages, I'd be glad to help. I've been through that hell, and recently, too.You should have xlibs, or you wouldn't be running X windows. Error message?
I had trouble getting the damned closed blob Nvidia driver working on Ubuntu 7.04--it might possibly be a related issue. Currently I'm running Zenwalk linux, because it's boot cd supports my SATA DVD drive.
I'll say this about Zenwalk...I've never had so little trouble getting my display set up. EVERY other distro does some kind of voodoo which ignores what you do to /etc/X11/xorg.conf. And I've played with all the major distros.
Now that I think about it, I never did get the Nvidia closed driver to work on 7.04. It was ok on 6.x....
#68
Pretty much i have to set up the server in a few more days and i can for sure run those mac tests, but i really have no idea when I'll get around to messing with trying to compile anything on linux. For the most part Lee, i don't even have the compiler needs installed let alone build the project, change/add to the source code to even be close to getting it to compile. If someone could get me an executable with this in it i can run the tests once i get the ati drivers to load. Running on a conroe e6420 and crossfire ati 1800's. The mac is 2 dual core xeons and nvidia 7300gt.
05/28/2007 (10:14 pm)
Yeah i can't get the display drivers and right now I'm tired of jumping through linux hoops :)Pretty much i have to set up the server in a few more days and i can for sure run those mac tests, but i really have no idea when I'll get around to messing with trying to compile anything on linux. For the most part Lee, i don't even have the compiler needs installed let alone build the project, change/add to the source code to even be close to getting it to compile. If someone could get me an executable with this in it i can run the tests once i get the ati drivers to load. Running on a conroe e6420 and crossfire ati 1800's. The mac is 2 dual core xeons and nvidia 7300gt.
#69
05/28/2007 (11:37 pm)
Sorry, but it coredumps currently (at least mine does). I believe Duncan build a SUSE 10.2 binary above.
#70
Try and debug it, it's only about 20 lines of code in there, unless the bug is elsewhere in TGE ( doubt it)
google for DDD, its a good debugger
05/28/2007 (11:50 pm)
Mine also core-dumped but I assumed it was due to not having a 3d graphics card on that box. Perhaps there is a bug in platformX86Unix/X86UnixMultiprocessor.cc.Try and debug it, it's only about 20 lines of code in there, unless the bug is elsewhere in TGE ( doubt it)
google for DDD, its a good debugger
#71
Can't seem to find these functions in any of the headers...
05/29/2007 (11:40 am)
Mac errors...Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:32: error: '_SC_NPROCESSORS_ONLN' undeclared (first use this function) Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:45: error: 'cpu_set_t' undeclared (first use this function) Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:45: error: parse error before ';' token Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:46: error: 'cpuAffinity' undeclared (first use this function) Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:46: error: 'CPU_ZERO' undeclared (first use this function) Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:49: error: 'CPU_SET' undeclared (first use this function) Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:51: error: 'sched_setaffinity' undeclared (first use this function)
Can't seem to find these functions in any of the headers...
#72
05/29/2007 (1:39 pm)
Yeah, macCarbMultiprocess.cc is going to have to be re-written to mac standard headers. I can make the define for processors = sysconf(_SC_NPROCESSORS_ONLN), thats no big deal, but the createThreadPool, out of my league...
#73
The threadpool code 'should' compile on a mac because they use pthreads as well according to macCarbThreads.cc
05/29/2007 (2:02 pm)
@Jaon , look in macCarbThreads.cc to see what headers they use for the thread code on a MacThe threadpool code 'should' compile on a mac because they use pthreads as well according to macCarbThreads.cc
#74
For now I think use the following to create a thread pool thats not bound to a particular cpu and lets see what happens
05/29/2007 (2:19 pm)
I'll have to google for how to set thread affinity on a Mac osX unless a Mac programmer can chime in here.For now I think use the following to create a thread pool thats not bound to a particular cpu and lets see what happens
oid MultiProcess::createThreadPool()
{
/*
int cpusetsize = 32;
//set process affinity to all available CPU's
cpu_set_t cpuAffinity;
CPU_ZERO(&cpuAffinity);
for(int i=0;i < processors; i++)
{
CPU_SET(i, &cpuAffinity);
}
sched_setaffinity( getpid(), cpusetsize, &cpuAffinity);
*/
pthread_t threadID;
// create a thread pool
for(int i=0;i < processors; i++)
{
pthread_create(&threadID, NULL, MultiProcessRunHandler, (void*)this);
pthread_detach(threadID);
//assign thread to a CPU
//CPU_ZERO(&cpuAffinity);
//CPU_SET(i, &cpuAffinity);
//sched_setaffinity( (pid_t)threadID, cpusetsize, &cpuAffinity);
// store the thread id
threadHandle[i] = (S32)threadID;
}
}
#75
heres a createThreadPool Function declared in threads.h
and the parameters...
threadStyle
The type of thread to create for this set of threads in the pool. Cooperative is the only type that you can specify. Historically, the Thread Manger supported two types of threads, preemptive and cooperative. However, due to severe limitations on their use, the Thread Manager no longer supports preemptive threads.
numToCreate
The number of threads to create for the pool.
stackSize
The stack size for this set of threads in the pool. This stack must be large enough to handle saved thread context, normal application stack usage, interrupt handling functions, and CPU exceptions. Specify a stack size of 0 to request the Thread Manager's default stack size for the specified type of thread.
Not a programmer, but doing the research and thats what i came up with. Since i can't find those anywhere in the adc documentation.
05/29/2007 (2:43 pm)
Cpu_set_t, no reference, not even in the adc same with the rest listed above. MacCarbThread doesn't use those calls either. I think these would best be handled by the Multiprocessor API. Unless these are found in some other header, they don't seem to exists in the mac libs.heres a createThreadPool Function declared in threads.h
OSErr CreateThreadPool ( ThreadStyle threadStyle, SInt16 numToCreate, Size stackSize );
and the parameters...
threadStyle
The type of thread to create for this set of threads in the pool. Cooperative is the only type that you can specify. Historically, the Thread Manger supported two types of threads, preemptive and cooperative. However, due to severe limitations on their use, the Thread Manager no longer supports preemptive threads.
numToCreate
The number of threads to create for the pool.
stackSize
The stack size for this set of threads in the pool. This stack must be large enough to handle saved thread context, normal application stack usage, interrupt handling functions, and CPU exceptions. Specify a stack size of 0 to request the Thread Manager's default stack size for the specified type of thread.
Not a programmer, but doing the research and thats what i came up with. Since i can't find those anywhere in the adc documentation.
#76
05/29/2007 (2:57 pm)
Ack, ok i'll comment them out
#77
This isn't forced for MP

This is forced and you can see the frame rate hit but...

It does force it to MP thread...
05/29/2007 (3:30 pm)
Ok bad news.. mp_force is bad stuff on the mac, least with this code... Can't get dobenchmark to run without crashing either. Although i did manage this..This isn't forced for MP

This is forced and you can see the frame rate hit but...

It does force it to MP thread...
#78
CPU usage went from 100.9% to 201%, that looks like the threads are only running on two of your 4 cores.
Perhaps try mp_forcethreads(2) and see how it compares
05/29/2007 (4:29 pm)
I see your thread count went from 6 to 10 so we know the thread creation worked.CPU usage went from 100.9% to 201%, that looks like the threads are only running on two of your 4 cores.
Perhaps try mp_forcethreads(2) and see how it compares
#79
Oh and to tell, the red bars indicates the system threads going crazy and killing the engine threads.
05/29/2007 (5:06 pm)
The threads actually went up to 16 at a later point, but it crashed. At the bottom you can see the hyper activity of the cpu usage across all 4 cores and you can also see the system kicked up to about 42% for it's normal maybe 12-15. I did run it @ 2 as well which yielded similar results with the fps dip and it only looked hyper active across 2 of the core monitors. Again, it doesn't seem like the way to handle MP threads on the mac. I wish it wes better news, maybe some mac guru can come along and show us the way?Oh and to tell, the red bars indicates the system threads going crazy and killing the engine threads.
#80
07/18/2007 (8:30 am)
Came across this article today...Quote:Q. How will Unreal Tournament 3 use multiple cores on a CPU? Does it take advantage of Quad Core CPU's? If so, how/what task is assigned to each core?Splitting scene traversal and rendering into its own thread may be very difficult in TGE, but maybe not so much with the TGEA RenderManager.
A. Unreal Engine 3 is a transitional multithreaded architecture. It runs two heavyweight threads, and a pool of helper threads.
The primary thread is responsible for running UnrealScript AI and gameplay logic and networking. The secondary thread is responsible for all rendering work. The pool of helper threads accelerate additional modular tasks such as physics, data decompression, and streaming.
Torque 3D Owner Sebastien Bourgon