Game Development Community

Multi-processor parallel programming in TGE

by Duncan Gray · in Torque Game Engine · 05/10/2007 (12:17 am) · 83 replies

I just tried running 4 threads on a single CPU and I did not get the above predicted clash in the assembly code.
In fact I get no problems at all.

Although I only adapted the updateSkin method for this test, this multiprocess approach can be used to increase performance in collission, physics, animation calculations, AI etc as well

If you want to try the demo, please post here or email me.
#61
05/28/2007 (2:59 pm)
So do I, but I dont have a nvidia 7x00 card and the only machines that have problems aren't mine to sit and work with to see whats going on.
#62
05/28/2007 (6:22 pm)
Here's what I got with my coreduo with lots-a-korks:
Parallel Processing Actvated on 2 CPU's
Setting single thread modet
Benchmark test took 16 milliseconds in single thread mode
Parallel Processing activated
Benchmark test took 93 milliseconds with 2 threads
Best performance [16] was with 1 threads
setting thread pool accordingly
#63
05/28/2007 (6:25 pm)
BTW with your latest binary, when I try MP_setThreaded(): I get this insulting:

Parallel Processing not possible on 1 CPU

Well, I never...!
#64
05/28/2007 (6:28 pm)
Some more data with the latest binary on my coreduo:

==>MP_doBenchMark();
Setting single thread modet
Benchmark test took 15 milliseconds in single thread mode
==>MP_forceThreads(2);
Parallel Processing activated
==>MP_doBenchMark();
Setting single thread modet
Benchmark test took 16 milliseconds in single thread mode
Parallel Processing activated
Benchmark test took 109 milliseconds with 2 threads
Best performance [16] was with 1 threads
setting thread pool accordingly
==>MP_forceThreads(4);
Parallel Processing activated
==>MP_doBenchMark();
Setting single thread modet
Benchmark test took 16 milliseconds in single thread mode
Parallel Processing activated
Benchmark test took 94 milliseconds with 2 threads
Benchmark test took 110 milliseconds with 3 threads
Benchmark test took 109 milliseconds with 4 threads
Best performance [16] was with 1 threads
setting thread pool accordingly

Presumeably there is a problem with the test?
#65
05/28/2007 (7:42 pm)
Lol, you got the insulting message because the benchmark test set your cpu count back to one after its performance test.

You will have to use MP_forcethreads(2)

Quote:Presumeably there is a problem with the test?

Could be, we really need a Linux or Mac comparison at some point to figure this out.
#66
05/28/2007 (8:10 pm)
Well i have unbuntu sadly the d/l link is down and i'm having issues compiling. If anyone can point me in the right direction with compilers that work in ubuntu i'd have the test done. Also, I'll throw this one on the quad mac tomorrow, i could even test it on the quad running windows, but it looks like you already have a quad test.

BTW, whats up with lack of xlibs in ubuntu, i can't even get the ati drivers to load 0_o..
#67
05/28/2007 (9:34 pm)
@Jason: if you've got some specific error messages, I'd be glad to help. I've been through that hell, and recently, too.

You should have xlibs, or you wouldn't be running X windows. Error message?

I had trouble getting the damned closed blob Nvidia driver working on Ubuntu 7.04--it might possibly be a related issue. Currently I'm running Zenwalk linux, because it's boot cd supports my SATA DVD drive.

I'll say this about Zenwalk...I've never had so little trouble getting my display set up. EVERY other distro does some kind of voodoo which ignores what you do to /etc/X11/xorg.conf. And I've played with all the major distros.

Now that I think about it, I never did get the Nvidia closed driver to work on 7.04. It was ok on 6.x....
#68
05/28/2007 (10:14 pm)
Yeah i can't get the display drivers and right now I'm tired of jumping through linux hoops :)

Pretty much i have to set up the server in a few more days and i can for sure run those mac tests, but i really have no idea when I'll get around to messing with trying to compile anything on linux. For the most part Lee, i don't even have the compiler needs installed let alone build the project, change/add to the source code to even be close to getting it to compile. If someone could get me an executable with this in it i can run the tests once i get the ati drivers to load. Running on a conroe e6420 and crossfire ati 1800's. The mac is 2 dual core xeons and nvidia 7300gt.
#69
05/28/2007 (11:37 pm)
Sorry, but it coredumps currently (at least mine does). I believe Duncan build a SUSE 10.2 binary above.
#70
05/28/2007 (11:50 pm)
Mine also core-dumped but I assumed it was due to not having a 3d graphics card on that box. Perhaps there is a bug in platformX86Unix/X86UnixMultiprocessor.cc.

Try and debug it, it's only about 20 lines of code in there, unless the bug is elsewhere in TGE ( doubt it)

google for DDD, its a good debugger
#71
05/29/2007 (11:40 am)
Mac errors...
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:32: error: '_SC_NPROCESSORS_ONLN' undeclared (first use this function)
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:45: error: 'cpu_set_t' undeclared (first use this function)
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:45: error: parse error before ';' token
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:46: error: 'cpuAffinity' undeclared (first use this function)
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:46: error: 'CPU_ZERO' undeclared (first use this function)
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:49: error: 'CPU_SET' undeclared (first use this function)
Torque SDK/engine/platformMacCarb/macCarbMultiProcess.cc:51: error: 'sched_setaffinity' undeclared (first use this function)

Can't seem to find these functions in any of the headers...
#72
05/29/2007 (1:39 pm)
Yeah, macCarbMultiprocess.cc is going to have to be re-written to mac standard headers. I can make the define for processors = sysconf(_SC_NPROCESSORS_ONLN), thats no big deal, but the createThreadPool, out of my league...
#73
05/29/2007 (2:02 pm)
@Jaon , look in macCarbThreads.cc to see what headers they use for the thread code on a Mac

The threadpool code 'should' compile on a mac because they use pthreads as well according to macCarbThreads.cc
#74
05/29/2007 (2:19 pm)
I'll have to google for how to set thread affinity on a Mac osX unless a Mac programmer can chime in here.

For now I think use the following to create a thread pool thats not bound to a particular cpu and lets see what happens
oid MultiProcess::createThreadPool()
{
/*
	int cpusetsize = 32;
	//set process affinity to all available CPU's
        cpu_set_t cpuAffinity;
	CPU_ZERO(&cpuAffinity);
	for(int i=0;i < processors; i++)
	{	
		 CPU_SET(i, &cpuAffinity);
	}
	sched_setaffinity( getpid(), cpusetsize, &cpuAffinity);
*/
	pthread_t threadID;
	// create a thread pool 
	for(int i=0;i < processors; i++)
	{	
		pthread_create(&threadID, NULL, MultiProcessRunHandler, (void*)this);
		pthread_detach(threadID);
		//assign thread to a CPU
		//CPU_ZERO(&cpuAffinity);
		//CPU_SET(i, &cpuAffinity);
		//sched_setaffinity( (pid_t)threadID, cpusetsize, &cpuAffinity);
		// store the thread id		
		threadHandle[i] = (S32)threadID;
	}
}
#75
05/29/2007 (2:43 pm)
Cpu_set_t, no reference, not even in the adc same with the rest listed above. MacCarbThread doesn't use those calls either. I think these would best be handled by the Multiprocessor API. Unless these are found in some other header, they don't seem to exists in the mac libs.
heres a createThreadPool Function declared in threads.h
OSErr CreateThreadPool (
   ThreadStyle threadStyle,
   SInt16 numToCreate,
   Size stackSize
);

and the parameters...

threadStyle
The type of thread to create for this set of threads in the pool. Cooperative is the only type that you can specify. Historically, the Thread Manger supported two types of threads, preemptive and cooperative. However, due to severe limitations on their use, the Thread Manager no longer supports preemptive threads.

numToCreate
The number of threads to create for the pool.

stackSize
The stack size for this set of threads in the pool. This stack must be large enough to handle saved thread context, normal application stack usage, interrupt handling functions, and CPU exceptions. Specify a stack size of 0 to request the Thread Manager's default stack size for the specified type of thread.

Not a programmer, but doing the research and thats what i came up with. Since i can't find those anywhere in the adc documentation.
#76
05/29/2007 (2:57 pm)
Ack, ok i'll comment them out
#77
05/29/2007 (3:30 pm)
Ok bad news.. mp_force is bad stuff on the mac, least with this code... Can't get dobenchmark to run without crashing either. Although i did manage this..
This isn't forced for MP
beautifulpeoplesclub.net/phpBBdev/files/not_forced_132.jpg
This is forced and you can see the frame rate hit but...
beautifulpeoplesclub.net/phpBBdev/files/mp_running_111.jpg
It does force it to MP thread...
beautifulpeoplesclub.net/phpBBdev/files/mp_active_170.jpg
#78
05/29/2007 (4:29 pm)
I see your thread count went from 6 to 10 so we know the thread creation worked.
CPU usage went from 100.9% to 201%, that looks like the threads are only running on two of your 4 cores.

Perhaps try mp_forcethreads(2) and see how it compares
#79
05/29/2007 (5:06 pm)
The threads actually went up to 16 at a later point, but it crashed. At the bottom you can see the hyper activity of the cpu usage across all 4 cores and you can also see the system kicked up to about 42% for it's normal maybe 12-15. I did run it @ 2 as well which yielded similar results with the fps dip and it only looked hyper active across 2 of the core monitors. Again, it doesn't seem like the way to handle MP threads on the mac. I wish it wes better news, maybe some mac guru can come along and show us the way?

Oh and to tell, the red bars indicates the system threads going crazy and killing the engine threads.
#80
07/18/2007 (8:30 am)
Came across this article today...

Quote:Q. How will Unreal Tournament 3 use multiple cores on a CPU? Does it take advantage of Quad Core CPU's? If so, how/what task is assigned to each core?

A. Unreal Engine 3 is a transitional multithreaded architecture. It runs two heavyweight threads, and a pool of helper threads.

The primary thread is responsible for running UnrealScript AI and gameplay logic and networking. The secondary thread is responsible for all rendering work. The pool of helper threads accelerate additional modular tasks such as physics, data decompression, and streaming.
Splitting scene traversal and rendering into its own thread may be very difficult in TGE, but maybe not so much with the TGEA RenderManager.