Mac PPC, Mac Intel, and PC Profiler Dumps
by Rubes · in Torque Game Engine · 12/31/2006 (4:54 pm) · 27 replies
Hey folks,
Although I've been working with TGE for about a year now, most of my effort has been on the scripting side and I don't have a tremendous amount of experience with the engine or, more specifically, with optimization. So now that I have made some simple modifications to the TGE 1.5 engine and have compiled it for both Mac and Windows, I've had the chance to play around with my project on three platforms (Mac PowerPC, Mac Intel, and WinXP). And, like others, I've noticed some significant differences between them.
I have followed some of the recent threads on performance of TGE 1.5 on Windows and Mac, but I still don't fully understand why there are significant performance issues between those builds. So I went ahead and tested my game on all three platforms, and ran the TGE profiler for a similar period of time under the same conditions, with interesting results. I'd like to post the results of the profiler dump here to see if I could get at least some basic input from those smarter than me on this stuff, and hopefully I'll understand it better. If nothing, at least people will get to see some profiler results of the same TGE mission running on three different platforms.
The conditions at the time of the profiler dump are:
- screen size 1024 x 768, 32-bit, windowed mode
When the mission starts, the player is standing inside a room of an interior DIF, which has a single window to the outside terrain. The profiler is run for approximately the same amount of time for each right after loading the mission, and while doing the following:
- spinning around 360 degrees in the starting room,
- moving into an adjacent room, and
- spinning around 360 degrees again.
The machines tested on were:
- PowerMac G5 Dual 2Ghz w/ATI Radeon X800 (PowerPC)
- MacBookPro 2.18GHz w/ATI Radeon X1600 (Mac Intel and WinXP)
The best performance was, by far, on the WinXP platform. The speed was great, and there was absolutely no lag as the player performed the 360 degree spins. The worst performance was on the Intel Mac (same machine). Really terrible lag while spinning. The PowerPC performance was in between, but far closer to the Intel Mac performance with significant lag while spinning. Below are the top results from the profiler dump:
(continued)
Although I've been working with TGE for about a year now, most of my effort has been on the scripting side and I don't have a tremendous amount of experience with the engine or, more specifically, with optimization. So now that I have made some simple modifications to the TGE 1.5 engine and have compiled it for both Mac and Windows, I've had the chance to play around with my project on three platforms (Mac PowerPC, Mac Intel, and WinXP). And, like others, I've noticed some significant differences between them.
I have followed some of the recent threads on performance of TGE 1.5 on Windows and Mac, but I still don't fully understand why there are significant performance issues between those builds. So I went ahead and tested my game on all three platforms, and ran the TGE profiler for a similar period of time under the same conditions, with interesting results. I'd like to post the results of the profiler dump here to see if I could get at least some basic input from those smarter than me on this stuff, and hopefully I'll understand it better. If nothing, at least people will get to see some profiler results of the same TGE mission running on three different platforms.
The conditions at the time of the profiler dump are:
- screen size 1024 x 768, 32-bit, windowed mode
When the mission starts, the player is standing inside a room of an interior DIF, which has a single window to the outside terrain. The profiler is run for approximately the same amount of time for each right after loading the mission, and while doing the following:
- spinning around 360 degrees in the starting room,
- moving into an adjacent room, and
- spinning around 360 degrees again.
The machines tested on were:
- PowerMac G5 Dual 2Ghz w/ATI Radeon X800 (PowerPC)
- MacBookPro 2.18GHz w/ATI Radeon X1600 (Mac Intel and WinXP)
The best performance was, by far, on the WinXP platform. The speed was great, and there was absolutely no lag as the player performed the 360 degree spins. The worst performance was on the Intel Mac (same machine). Really terrible lag while spinning. The PowerPC performance was in between, but far closer to the Intel Mac performance with significant lag while spinning. Below are the top results from the profiler dump:
(continued)
#2
01/01/2007 (4:23 am)
I bet the Intel Mac is not using the MMX-optimized terrain blender.
#3
Also, any thoughts about the PowerPC code, which is just about as laggy and seems to also spend excess effort on terrain rendering?
01/01/2007 (12:31 pm)
Is that something that is fixable without major rewrites?Also, any thoughts about the PowerPC code, which is just about as laggy and seems to also spend excess effort on terrain rendering?
#4
The PPC code should be using the altivec blender, which ought to be of comparable speed. I'd suggest looking into that with Shark if you want more in-depth data though.
01/01/2007 (4:29 pm)
Yes - you should be able to reuse the x86-optimized blender code on Intel Macs. Details of making that happen aren't something I can speak to off the top of my head.The PPC code should be using the altivec blender, which ought to be of comparable speed. I'd suggest looking into that with Shark if you want more in-depth data though.
#5
From console.log:
And this is the code that spits it out:
This is a Core2Duo macbook pro. After a little more poking, it turns out that gestalt is actually returning that my macbook is a gestaltCPUPentium. So add in an extra case for that by the Pentium 4 [dirty hack] and we're partway there. There's a few more potential Pentium things returnable by gestalt; this macbook doesn't return them so I've no idea when they may or may not be returned. There's even a 486 option there...
Next up we scroll down to this:
Well, that's pretty presumptious. I tried this, since I'm dirty:
But the whole thing still feels a little on the slow side. So, uh.
Unfortunately, I'm still on xmas vacation. Gimme a couple days and if no-one else has poked at this again, I'll probably do it. I really don't know much about how torque decides whether or not to do stuff with processors, so I may have just missed something obvious...
Gary (-;
01/02/2007 (2:52 am)
*pokes*From console.log:
Unknown Getstalt value for processor type: 0x69353836 platform layer should be changed to use sysctl() instead of Gestalt() . Unknown Processor, assuming x86 Compatible, 2147 Mhz
And this is the code that spits it out:
Platform::SystemInfo.processor.type = CPU_PowerPC_Unknown;
err = Gestalt(gestaltNativeCPUtype, &raw);
switch(raw)
{
<snippage>
case gestaltCPUX86:
Platform::SystemInfo.processor.type = CPU_X86Compatible;
Platform::SystemInfo.processor.name = StringTable->insert("x86 Compatible");
break;
case gestaltCPUPentium4:
Platform::SystemInfo.processor.type = CPU_Intel_Pentium4;
Platform::SystemInfo.processor.name = StringTable->insert("Intel Pentium 4");
break;
default:
// explain why we can't get the processor type.
Con::warnf("Unknown Getstalt value for processor type: 0x%x",raw);
Con::warnf("platform layer should be changed to use sysctl() instead of Gestalt() .");
// for now, identify it as an x86 processor, because Apple is moving to Intel chips...
Platform::SystemInfo.processor.type = CPU_X86Compatible;
Platform::SystemInfo.processor.name = StringTable->insert("Unknown Processor, assuming x86 Compatible");
break;
}This is a Core2Duo macbook pro. After a little more poking, it turns out that gestalt is actually returning that my macbook is a gestaltCPUPentium. So add in an extra case for that by the Pentium 4 [dirty hack] and we're partway there. There's a few more potential Pentium things returnable by gestalt; this macbook doesn't return them so I've no idea when they may or may not be returned. There's even a 486 option there...
Next up we scroll down to this:
Platform::SystemInfo.processor.properties = CPU_PROP_PPCMIN;
err = Gestalt(gestaltPowerPCProcessorFeatures, &raw);
#if defined(__VEC__)
if ((1 << gestaltPowerPCHasVectorInstructions) & (raw)) {
Platform::SystemInfo.processor.properties |= CPU_PROP_ALTIVEC; // OR it in as they are flags...
}
#endif
Con::printf(" %s, %d Mhz", Platform::SystemInfo.processor.name, Platform::SystemInfo.processor.mhz);
if (Platform::SystemInfo.processor.properties & CPU_PROP_PPCMIN)
Con::printf(" FPU detected");Well, that's pretty presumptious. I tried this, since I'm dirty:
Platform::SystemInfo.processor.properties = CPU_PROP_MMX | CPU_PROP_FPU | CPU_PROP_SSE;
But the whole thing still feels a little on the slow side. So, uh.
Unfortunately, I'm still on xmas vacation. Gimme a couple days and if no-one else has poked at this again, I'll probably do it. I really don't know much about how torque decides whether or not to do stuff with processors, so I may have just missed something obvious...
Gary (-;
#6
01/02/2007 (3:34 am)
Interesting thread. I just bought a Mac Mini and will step into the same trap. I'm very interested in your results. :-)
#7
I just started to migrate macCarbCPUInfo.cc to sysctl() calls yesterday in order to set it up for x86 code and I'm almost done. There are a few things I haven't figured out how to access through sysctl() yet and the processor types from Gestalt() don't quite match up with sysctl(), but I think it'll be cleaner and will allow the addition of x86 optimized code.
After that I'll have to give it to someone with an Intel Mac :-)
01/02/2007 (7:04 am)
@Gary:I just started to migrate macCarbCPUInfo.cc to sysctl() calls yesterday in order to set it up for x86 code and I'm almost done. There are a few things I haven't figured out how to access through sysctl() yet and the processor types from Gestalt() don't quite match up with sysctl(), but I think it'll be cleaner and will allow the addition of x86 optimized code.
After that I'll have to give it to someone with an Intel Mac :-)
#9
*grabby motions*
Gary (-;
01/02/2007 (9:46 am)
Quote:After that I'll have to give it to someone with an Intel Mac :-)
*grabby motions*
Gary (-;
#10
Doesn't or shouldn't matter. You probably just want to rip out all of that gestalt junk and start again from scratch. Reading through macCarbCPUInfo.cc, there's an awful lot of awful silly pointless stuff going on. Reading through the other platforms, most of this crap is mostly ignored anyways.
Flicking through various bits of winCPUInfo.cc [let's be pragmatic; your goal in torque code is to keep up with the windows version :-)], you find this it only really cares about FPU, 3DNOW, MMX, and SSE:
The prelude that sets properties on windows is... faintly silly and no small amount of work. So our shopping list thus appears in platform.h:
Quick poke in the manpage for sysctl(8) and reading the headers mach/machine.h and sys/sysctl.h and bam:
Woop, no 3DNow. In fact, reading the headers, 3dnow can never make an appearance here. Leaving the three keys hw.optional.floatingpoint, hw.optional.sse and hw.optional.mmx
By my count the other keys you actually need, from torque's platform.h again:
Tra la la, remaining keys:
Finally, there's the actual processor type:
You *could* go into a switch/case frenzy with these puppies, except currently torque only supports G3 and above, or any intel mac; from mach/machine.h, salient snippage:
There's many many other types... none of them matter for us. And then there's the subtypes. There's a bunch of them, if I were you I'd just use the torque enums CPU_X86Compatible or CPU_PowerPC_Unknown; if they're good enough for windows, they're good enough for us.
So, uh, with minimal creativity, you could probably replace the whole of void Processor::init() in macCarbCPUInfo with about twenty lines.
Gary (-;
EDIT: Fixed some markup
01/02/2007 (10:25 am)
Quote:I just started to migrate macCarbCPUInfo.cc to sysctl() calls yesterday in order to set it up for x86 code and I'm almost done. There are a few things I haven't figured out how to access through sysctl() yet and the processor types from Gestalt() don't quite match up with sysctl(), but I think it'll be cleaner and will allow the addition of x86 optimized code.
Doesn't or shouldn't matter. You probably just want to rip out all of that gestalt junk and start again from scratch. Reading through macCarbCPUInfo.cc, there's an awful lot of awful silly pointless stuff going on. Reading through the other platforms, most of this crap is mostly ignored anyways.
Flicking through various bits of winCPUInfo.cc [let's be pragmatic; your goal in torque code is to keep up with the windows version :-)], you find this it only really cares about FPU, 3DNOW, MMX, and SSE:
The prelude that sets properties on windows is... faintly silly and no small amount of work. So our shopping list thus appears in platform.h:
enum x86Properties
{ // x86 properties
CPU_PROP_C = (1<<0),
CPU_PROP_FPU = (1<<1),
CPU_PROP_MMX = (1<<2), // Integer-SIMD
CPU_PROP_3DNOW = (1<<3), // AMD Float-SIMD
CPU_PROP_SSE = (1<<4), // PentiumIII SIMD
CPU_PROP_RDTSC = (1<<5) // Read Time Stamp Counter
// CPU_PROP_SSE2 = (1<<6), // Pentium4 SIMD
// CPU_PROP_MP = (1<<7) // Multi-processor system
};Quick poke in the manpage for sysctl(8) and reading the headers mach/machine.h and sys/sysctl.h and bam:
/* * CPU families (sysctl hw.cpufamily) * * NB: the encodings of the CPU families are intentionally arbitrary. * There is no ordering, and you should never try to deduce whether * or not some feature is available based on the family. * Use feature flags (eg, hw.optional.altivec) to test for optional * functionality. */ c0a8000c:~ chunky$ sysctl hw.optional hw.optional.floatingpoint: 1 hw.optional.mmx: 1 hw.optional.sse: 1 hw.optional.sse2: 1 hw.optional.sse3: 1 hw.optional.x86_64: 1 hw.optional.supplementalsse3: 1 second level name optional in hw.optional is invalid c0a8000c:~ chunky$
Woop, no 3DNow. In fact, reading the headers, 3dnow can never make an appearance here. Leaving the three keys hw.optional.floatingpoint, hw.optional.sse and hw.optional.mmx
By my count the other keys you actually need, from torque's platform.h again:
static struct SystemInfo_struct
{
struct Processor
{
ProcessorType type;
const char *name;
U32 mhz;
U32 properties; // CPU type specific enum
} processor;
} SystemInfo;Tra la la, remaining keys:
c0a8000c:~ chunky$ sysctl hw.cpufrequency machdep.cpu.brand_string hw.cpufrequency: 2330000000 machdep.cpu.brand_string: Intel(R) Core(TM)2 CPU T7600 @ 2.33GHz c0a8000c:~ chunky$
Finally, there's the actual processor type:
c0a8000c:~ chunky$ sysctl hw.cputype hw.cpusubtype hw.cputype: 7 hw.cpusubtype: 4 c0a8000c:~ chunky$
You *could* go into a switch/case frenzy with these puppies, except currently torque only supports G3 and above, or any intel mac; from mach/machine.h, salient snippage:
#define CPU_TYPE_X86 ((cpu_type_t) 7) #define CPU_TYPE_I386 CPU_TYPE_X86 /* compatibility */ #define CPU_TYPE_X86_64 (CPU_TYPE_X86 | CPU_ARCH_ABI64) #define CPU_TYPE_POWERPC ((cpu_type_t) 18) #define CPU_TYPE_POWERPC64 (CPU_TYPE_POWERPC | CPU_ARCH_ABI64)
There's many many other types... none of them matter for us. And then there's the subtypes. There's a bunch of them, if I were you I'd just use the torque enums CPU_X86Compatible or CPU_PowerPC_Unknown; if they're good enough for windows, they're good enough for us.
So, uh, with minimal creativity, you could probably replace the whole of void Processor::init() in macCarbCPUInfo with about twenty lines.
Gary (-;
EDIT: Fixed some markup
#11
Of course, no hw.optional.x86_64 or hw.optional.supplementalsse3 results.
*Edit for silly typing.
01/02/2007 (10:35 am)
FYI, in case it helps, I did get basically the same results for my MacBook Pro (Core Dual, not Core2):hw.cputype: 7 hw.cpusubtype: 4
hw.optional.floatingpoint: 1 hw.optional.mmx: 1 hw.optional.sse: 1 hw.optional.sse2: 1 hw.optional.sse3: 1 second level name optional in hw.optional is invalid
Of course, no hw.optional.x86_64 or hw.optional.supplementalsse3 results.
*Edit for silly typing.
#12
As stated on some other threads, the intel mac version of torque doesn't use the mmx blender because that blender outputs the wrong texture format, and I've yet to untangle the asm code to find out how it works. Previous attempts were unfruitful.
Since it seems to be a stunbling block for a lot of ppl, I'll take another pass at it this week.
/Paul
01/03/2007 (1:45 pm)
@All: yep, the processor detection code should be using sysctl(), it's just not a huge priority because, currently, it doesn't really affect a lot of stuff.As stated on some other threads, the intel mac version of torque doesn't use the mmx blender because that blender outputs the wrong texture format, and I've yet to untangle the asm code to find out how it works. Previous attempts were unfruitful.
Since it seems to be a stunbling block for a lot of ppl, I'll take another pass at it this week.
/Paul
#13
01/03/2007 (3:01 pm)
@Paul: I sent an email to your gmail about this.
#14
01/03/2007 (3:03 pm)
That would be awesome, Paul. Much appreciated.
#15
Of course, now that we've detected it all... turns out that actually enabling that stuff is in the platform layer also. So go into your math dir in XCode, make sure that mMath_ASM.asm, mMathSSE_ASM.asm are in your project, and add them to your target.
Additionally, you'll need to rightclick on them, click get info, and in the general tab, set the filetype to sourcecode.nasm instead of sourcecode.asm. [There's another way to do the same thing, I just like this way more]
Then you'll need to modify macCarbMath to enable these things... my macCarbMath is all messed up at the moment, so I'm not going to paste it here, but it looks an awful lot like the windows version combined with the current mac version.
Of course, the next problem that this causes is that now your PPC builds won't build properly because they have x86 assembly files in the target. I didn't bother to look into this one yet... I'm not sure how to go about only including certain files for a certain platform.
Finally... none of this actually fixed what's wrong here, which is this in blender.cc:
If you try taking that little mac check out, then you get a bunch of other errors. And Paul has already said this won't work. So... yeah
Gary (-;
EDIT: Preprocessor stuffs
01/04/2007 (9:43 am)
Well, I got this far on the airplane:#include <sys/sysctl.h>
#include <mach/machine.h>
#define SYSCTLBUFLEN 128
void Processor::init()
{
OSErr err;
long raw, mhzSpeed = BASE_MHZ_SPEED;
Con::printf("System & Processor Information:");
err = Gestalt(gestaltSystemVersion, &raw);
Con::printf(" MacOS version: %x.%x.%x", (raw>>8), (raw&0xFF)>>4, (raw&0x0F));
err = Gestalt(gestaltCarbonVersion, &raw);
if (err)
Con::printf(" No CarbonLib support.");
else
Con::printf(" CarbonLib version: %x.%x.%x", (raw>>8), (raw&0xFF)>>4, (raw&0x0F));
int sysctli;
size_t sysctli_s = sizeof(sysctli);
char sysctls[SYSCTLBUFLEN];
size_t sysctls_s = sizeof(sysctls);
unsigned int twoints[2];
size_t twoints_s = sizeof(twoints);
if(0 != sysctlbyname("hw.memsize", (void *)&twoints, &twoints_s, NULL, 0)) {
} else {
Con::printf("System memory size: %iMB", twoints[0]/(1024*1024));
}
if(0 != sysctlbyname("hw.cpufrequency", (void *)&twoints, &twoints_s, NULL, 0)) {
Platform::SystemInfo.processor.mhz = 1000;
Con::errorf("Couldn't detect CPU Frequency, assuming 1GHz");
} else {
Platform::SystemInfo.processor.mhz = twoints[0]/(1000*1000);
Con::printf("CPU Frequency: %2.3fGHz", (float)twoints[0]/(1000*1000*1000));
}
if(0 != sysctlbyname("hw.cputype", (void *)&sysctli, &sysctli_s, NULL, 0)) {
Con::errorf("Couldn't detect CPUType. Assuming X86");
Platform::SystemInfo.processor.type = CPU_X86Compatible;
} else {
switch(sysctli) {
case CPU_TYPE_X86:
case CPU_TYPE_X86_64:
Con::printf("Detected X86 CPU");
Platform::SystemInfo.processor.type = CPU_X86Compatible;
break;
case CPU_TYPE_POWERPC64:
case CPU_TYPE_POWERPC:
Con::printf("Detected PPC CPU");
Platform::SystemInfo.processor.type = CPU_PowerPC_Unknown;
break;
default:
Con::errorf("It's not a X86 and it's not a PPC. Assuming it's X86. cputype: %i", sysctli);
Platform::SystemInfo.processor.type = CPU_X86Compatible;
break;
}
}
if(0 != sysctlbyname("machdep.cpu.brand_string", (void *)sysctls, &sysctls_s, NULL, 0)) {
Con::errorf("Couldn't get the CPU brand string");
Platform::SystemInfo.processor.name = StringTable->insert("Processor");
} else {
Con::printf("CPU: %s", sysctls);
Platform::SystemInfo.processor.name = StringTable->insert(sysctls);
}
Con::printf("Getting CPU Features and options");
switch(Platform::SystemInfo.processor.type) {
case CPU_X86Compatible:
Platform::SystemInfo.processor.properties = CPU_PROP_C;
if(0 == sysctlbyname("hw.optional.floatingpoint", (void *)&sysctli, &sysctli_s, NULL, 0)) {
if(sysctli) {
Platform::SystemInfo.processor.properties |= CPU_PROP_FPU;
Con::printf(" Have FPU");
}
}
if(0 == sysctlbyname("hw.optional.mmx", (void *)&sysctli, &sysctli_s, NULL, 0)) {
if(sysctli) {
Platform::SystemInfo.processor.properties |= CPU_PROP_MMX;
Con::printf(" Have MMX");
}
}
if(0 == sysctlbyname("hw.optional.sse", (void *)&sysctli, &sysctli_s, NULL, 0)) {
if(sysctli) {
Platform::SystemInfo.processor.properties |= CPU_PROP_SSE;
Con::printf(" Have SSE");
}
}
break;
case CPU_PowerPC_Unknown:
Platform::SystemInfo.processor.properties = CPU_PROP_PPCMIN;
if(0 == sysctlbyname("hw.optional.altivec", (void *)&sysctli, &sysctli_s, NULL, 0)) {
if(sysctli) {
Platform::SystemInfo.processor.properties |= CPU_PROP_ALTIVEC;
Con::printf(" Have Altivec");
}
}
if(0 == sysctlbyname("hw.ncpu", (void *)&sysctli, &sysctli_s, NULL, 0)) {
if(sysctli > 1) {
Platform::SystemInfo.processor.properties |= CPU_PROP_PPCMP;
Con::printf(" Have Multiprocessor");
}
}
break;
default:
Con::warnf("Shouldn't have got here. Torque's CPU settings are in an unknown state.");
break;
}
Con::printf(" ");
}Of course, now that we've detected it all... turns out that actually enabling that stuff is in the platform layer also. So go into your math dir in XCode, make sure that mMath_ASM.asm, mMathSSE_ASM.asm are in your project, and add them to your target.
Additionally, you'll need to rightclick on them, click get info, and in the general tab, set the filetype to sourcecode.nasm instead of sourcecode.asm. [There's another way to do the same thing, I just like this way more]
Then you'll need to modify macCarbMath to enable these things... my macCarbMath is all messed up at the moment, so I'm not going to paste it here, but it looks an awful lot like the windows version combined with the current mac version.
Of course, the next problem that this causes is that now your PPC builds won't build properly because they have x86 assembly files in the target. I didn't bother to look into this one yet... I'm not sure how to go about only including certain files for a certain platform.
Finally... none of this actually fixed what's wrong here, which is this in blender.cc:
#if defined(TORQUE_SUPPORTS_NASM) && !defined(TORQUE_OS_MAC) # define BLENDER_USE_ASM #endif
If you try taking that little mac check out, then you get a bunch of other errors. And Paul has already said this won't work. So... yeah
Gary (-;
EDIT: Preprocessor stuffs
#16
garagegames.com/mg/forums/result.thread.php?qt=55315
I tried enabling asm terrain blending in that build [platformX86UNIX on OSX], and the performance difference is... significant.
You know how rapidly turning around in torque/OSX, even on a rippingly fast machine, drops to one or two frames per second all the while you're turning? No such slowdown here...
Gary (-;
01/04/2007 (3:46 pm)
Oh, here's something...garagegames.com/mg/forums/result.thread.php?qt=55315
I tried enabling asm terrain blending in that build [platformX86UNIX on OSX], and the performance difference is... significant.
You know how rapidly turning around in torque/OSX, even on a rippingly fast machine, drops to one or two frames per second all the while you're turning? No such slowdown here...
Gary (-;
#17
Additionally, this is me doing a few laps of the new stronghold [with the weather effects cut by 90% because they're the worst slowdown of all...]:

The shark data is here: osxasm.mshark
Now it looks like torque is spending most of its time futzing with GL instead of futzing with itself.
Gary (-;
01/04/2007 (10:47 pm)
Well, since it didn't really make sense to not at least try it, I tried enabling terrain blending in asm in this MacCarb build, and ... maybe I'm imagining it, but I'm not seeing those slowdowns when I'm rapidly turning around, anymore. It seems like most of my speed problems are gone.Additionally, this is me doing a few laps of the new stronghold [with the weather effects cut by 90% because they're the worst slowdown of all...]:

The shark data is here: osxasm.mshark
Now it looks like torque is spending most of its time futzing with GL instead of futzing with itself.
Gary (-;
#18
Gary (-;
01/04/2007 (11:17 pm)
And because today is clearly a day for me to spam in this thread: TorqueOSXASM.dmg. Just the binary. Shouldn't work on anything but intel.Gary (-;
#19
I'm thinking there's more to this problem than the terrain blender. Going back to the original post, Rubes specifically mentions spinning around (turning) is what's bringing the framerate to a crawl. The framerates aren't so bad when moving forward, backwards or side-stepping. The turns are what kills. Interestingly, if you change your FOV you get results that seem counter-intuitive to me. If you do a setFOV(120); you're rendering more area, more terrain, more interiors, etc. yet this framerate drop that was caused by spinning goes away. If you do a setFOV(90); to take you back to the original FOV, framerates get hit again when spinning. I've found on Intel Macs at least (haven't tested on PPC yet) that this is consistent. The lower you set the FOV, the worse the framerates get when you turn. I had created a thread about this but only got one response...
http://www.garagegames.com/mg/forums/result.thread.php?qt=54672
Anyway, Gary's work is very cool indeed but even in testing the ASM additions, spinning kills so I think while any performance tweaks for the Mac are GREAT, we might not be looking in the right place. I'm still familiarizing myself with the SDK so I'm not ready to guess where the problem might be and I can't explain why the problem would get worse when you're (AFAIK) rendering fewer objects.
Thoughts?
Thanks,
Ben
01/13/2007 (1:22 am)
Hi all,I'm thinking there's more to this problem than the terrain blender. Going back to the original post, Rubes specifically mentions spinning around (turning) is what's bringing the framerate to a crawl. The framerates aren't so bad when moving forward, backwards or side-stepping. The turns are what kills. Interestingly, if you change your FOV you get results that seem counter-intuitive to me. If you do a setFOV(120); you're rendering more area, more terrain, more interiors, etc. yet this framerate drop that was caused by spinning goes away. If you do a setFOV(90); to take you back to the original FOV, framerates get hit again when spinning. I've found on Intel Macs at least (haven't tested on PPC yet) that this is consistent. The lower you set the FOV, the worse the framerates get when you turn. I had created a thread about this but only got one response...
http://www.garagegames.com/mg/forums/result.thread.php?qt=54672
Anyway, Gary's work is very cool indeed but even in testing the ASM additions, spinning kills so I think while any performance tweaks for the Mac are GREAT, we might not be looking in the right place. I'm still familiarizing myself with the SDK so I'm not ready to guess where the problem might be and I can't explain why the problem would get worse when you're (AFAIK) rendering fewer objects.
Thoughts?
Thanks,
Ben
#20
So when you zoom in a lot, a ton more detail has to be generated for far away geometry. This has a bit of overhead because the granularity of texturing is finite, so when detail is generated for something that might be partially visible (say you have an FOV of one degree), one or more whole terrain squares of detail have to be populated by the blender and uploaded.
When you spin with this narrow FOV, you cause the blender to create a TON of detail, and because the texture cache is of limited size, most of this generated data has to be discarded to make room for the next frame's textures.
Contrariwise, when you set the FOV to 120, no piece of terrain geometry needs much data, since it's relatively small by the pixel metric, and most of the data for a given frame will get reused on the next one (since even at quite a high rate of turn, at least part of the last frame's view will remain visible with a 120deg FOV). So there's a lot less work to do all around, and the texture cache is well-utilized.
What are the solutions to this issue?
Well, mostly, don't set a high FOV and spin. :P This is actually a reasonable place to have a higher cost, because it's likely fewer objects are being drawn, and also that the user will have difficulty discerning a reduced framerate, since if they're spinning around, the view is changing entirely anyway - so there's fewer visual cues that might give it away.
You could increase the texture cache size - doing so will result in fewer cache misses or flushes and potentially save you some framerate, but it might have to become quite large to store enough data to make a high zoom not cause flush/miss behavior.
Finally, you can optimize the blender, which we find, when done, does in fact largely address our framerate issues. Yay!
01/13/2007 (2:20 pm)
The bad turning performance when zoomed is due to how TGE's terrain caches texture data. As you may already know texture data is blended on the CPU then kept in a texture cache (a set of GL textures hopefully stored on-card) for rendering. The amount of detail put on each square of terrain is based on its pixel size.So when you zoom in a lot, a ton more detail has to be generated for far away geometry. This has a bit of overhead because the granularity of texturing is finite, so when detail is generated for something that might be partially visible (say you have an FOV of one degree), one or more whole terrain squares of detail have to be populated by the blender and uploaded.
When you spin with this narrow FOV, you cause the blender to create a TON of detail, and because the texture cache is of limited size, most of this generated data has to be discarded to make room for the next frame's textures.
Contrariwise, when you set the FOV to 120, no piece of terrain geometry needs much data, since it's relatively small by the pixel metric, and most of the data for a given frame will get reused on the next one (since even at quite a high rate of turn, at least part of the last frame's view will remain visible with a 120deg FOV). So there's a lot less work to do all around, and the texture cache is well-utilized.
What are the solutions to this issue?
Well, mostly, don't set a high FOV and spin. :P This is actually a reasonable place to have a higher cost, because it's likely fewer objects are being drawn, and also that the user will have difficulty discerning a reduced framerate, since if they're spinning around, the view is changing entirely anyway - so there's fewer visual cues that might give it away.
You could increase the texture cache size - doing so will result in fewer cache misses or flushes and potentially save you some framerate, but it might have to become quite large to store enough data to make a high zoom not cause flush/miss behavior.
Finally, you can optimize the blender, which we find, when done, does in fact largely address our framerate issues. Yay!
Torque 3D Owner Rubes
Intel Mac:
Intel WinXP:
What seems to jump out at me is that the Mac build, on both PPC and Intel, seems to spend a whole lot more time on terrain rendering than the WinXP build does. I'm not sure if this is the cause of the severe lag on the Mac, but I guess I wouldn't be surprised given that the difference in performance between the WinXP build and the Mac build (on both PPC and Intel) is pretty huge.
My intention is not to dredge up another "Mac performance sucks compared to Windows" threads; I'd just like to know if this information provides any clarification of what's going on on these three platforms and what might be done to optimize and improve things. Thanks...