Mac PPC, Mac Intel, and PC Profiler Dumps
by Rubes · in Torque Game Engine · 12/31/2006 (4:54 pm) · 27 replies
Hey folks,
Although I've been working with TGE for about a year now, most of my effort has been on the scripting side and I don't have a tremendous amount of experience with the engine or, more specifically, with optimization. So now that I have made some simple modifications to the TGE 1.5 engine and have compiled it for both Mac and Windows, I've had the chance to play around with my project on three platforms (Mac PowerPC, Mac Intel, and WinXP). And, like others, I've noticed some significant differences between them.
I have followed some of the recent threads on performance of TGE 1.5 on Windows and Mac, but I still don't fully understand why there are significant performance issues between those builds. So I went ahead and tested my game on all three platforms, and ran the TGE profiler for a similar period of time under the same conditions, with interesting results. I'd like to post the results of the profiler dump here to see if I could get at least some basic input from those smarter than me on this stuff, and hopefully I'll understand it better. If nothing, at least people will get to see some profiler results of the same TGE mission running on three different platforms.
The conditions at the time of the profiler dump are:
- screen size 1024 x 768, 32-bit, windowed mode
When the mission starts, the player is standing inside a room of an interior DIF, which has a single window to the outside terrain. The profiler is run for approximately the same amount of time for each right after loading the mission, and while doing the following:
- spinning around 360 degrees in the starting room,
- moving into an adjacent room, and
- spinning around 360 degrees again.
The machines tested on were:
- PowerMac G5 Dual 2Ghz w/ATI Radeon X800 (PowerPC)
- MacBookPro 2.18GHz w/ATI Radeon X1600 (Mac Intel and WinXP)
The best performance was, by far, on the WinXP platform. The speed was great, and there was absolutely no lag as the player performed the 360 degree spins. The worst performance was on the Intel Mac (same machine). Really terrible lag while spinning. The PowerPC performance was in between, but far closer to the Intel Mac performance with significant lag while spinning. Below are the top results from the profiler dump:
(continued)
Although I've been working with TGE for about a year now, most of my effort has been on the scripting side and I don't have a tremendous amount of experience with the engine or, more specifically, with optimization. So now that I have made some simple modifications to the TGE 1.5 engine and have compiled it for both Mac and Windows, I've had the chance to play around with my project on three platforms (Mac PowerPC, Mac Intel, and WinXP). And, like others, I've noticed some significant differences between them.
I have followed some of the recent threads on performance of TGE 1.5 on Windows and Mac, but I still don't fully understand why there are significant performance issues between those builds. So I went ahead and tested my game on all three platforms, and ran the TGE profiler for a similar period of time under the same conditions, with interesting results. I'd like to post the results of the profiler dump here to see if I could get at least some basic input from those smarter than me on this stuff, and hopefully I'll understand it better. If nothing, at least people will get to see some profiler results of the same TGE mission running on three different platforms.
The conditions at the time of the profiler dump are:
- screen size 1024 x 768, 32-bit, windowed mode
When the mission starts, the player is standing inside a room of an interior DIF, which has a single window to the outside terrain. The profiler is run for approximately the same amount of time for each right after loading the mission, and while doing the following:
- spinning around 360 degrees in the starting room,
- moving into an adjacent room, and
- spinning around 360 degrees again.
The machines tested on were:
- PowerMac G5 Dual 2Ghz w/ATI Radeon X800 (PowerPC)
- MacBookPro 2.18GHz w/ATI Radeon X1600 (Mac Intel and WinXP)
The best performance was, by far, on the WinXP platform. The speed was great, and there was absolutely no lag as the player performed the 360 degree spins. The worst performance was on the Intel Mac (same machine). Really terrible lag while spinning. The PowerPC performance was in between, but far closer to the Intel Mac performance with significant lag while spinning. Below are the top results from the profiler dump:
(continued)
#22
Thanks so much for the detailed posts. Very good info.
As a quick test, I did as you suggested and increased the texture cache size. HUGE improvement!!! Yay! As I suggested in my eariler post the default FOV of 90 was crawling when turning rapidly. This simple change significantly alleviates that problem. Thanks a bunch for pointing it out. Okay, here's the question: How large of a texture cache is reasonable? I initially increased it to 1024 (from the default 220) but I'm not sure at this point what units the size references. Is it just a data cache size in bytes or kilobytes?
I'll definitely check for the false bottlenecks as suggested.
Another question. Is it safe to assume that I could conditionally re-enable the intel ASM? Gary suggested we'd be breaking the PPC builds by doing so. I've never dealt with assembly inclusion before so I'm unsure if it's as simple as using compiler directives, etc.
Thanks,
Ben
01/13/2007 (2:58 pm)
HI Ben,Thanks so much for the detailed posts. Very good info.
As a quick test, I did as you suggested and increased the texture cache size. HUGE improvement!!! Yay! As I suggested in my eariler post the default FOV of 90 was crawling when turning rapidly. This simple change significantly alleviates that problem. Thanks a bunch for pointing it out. Okay, here's the question: How large of a texture cache is reasonable? I initially increased it to 1024 (from the default 220) but I'm not sure at this point what units the size references. Is it just a data cache size in bytes or kilobytes?
I'll definitely check for the false bottlenecks as suggested.
Another question. Is it safe to assume that I could conditionally re-enable the intel ASM? Gary suggested we'd be breaking the PPC builds by doing so. I've never dealt with assembly inclusion before so I'm unsure if it's as simple as using compiler directives, etc.
Thanks,
Ben
#23
Yes. Easily.
See, my way of enabling asm was the lazy way, because I curently am doing all my work on an intel mac; just adding .asm files to the build.
In order to do this in a PPC-friendly way, use the inline asm and gcc's assembler, along with some #ifdef __386__ blocks, instead of outline asm files. I think torque even has appropriate blocks in place, that you just need some fiddling to achieve. [they're called TORQUE_GCC_INLINE_ASM or something... you'd just need to #if defined(TORQUE_OS_MAC) && defined(__386__) or similar..]
Gary (-;
01/13/2007 (4:10 pm)
Quote:Another question. Is it safe to assume that I could conditionally re-enable the intel ASM? Gary suggested we'd be breaking the PPC builds by doing so. I've never dealt with assembly inclusion before so I'm unsure if it's as simple as using compiler directives, etc.
Yes. Easily.
See, my way of enabling asm was the lazy way, because I curently am doing all my work on an intel mac; just adding .asm files to the build.
In order to do this in a PPC-friendly way, use the inline asm and gcc's assembler, along with some #ifdef __386__ blocks, instead of outline asm files. I think torque even has appropriate blocks in place, that you just need some fiddling to achieve. [they're called TORQUE_GCC_INLINE_ASM or something... you'd just need to #if defined(TORQUE_OS_MAC) && defined(__386__) or similar..]
Gary (-;
#24
Begs the question... why?
Gary (-;
01/13/2007 (4:28 pm)
#if defined(TORQUE_OS_LINUX) || defined(TORQUE_OS_OPENBSD) // Texture slop isn't necessary on Linux U32 TerrainRender::mTextureSlopSize = 512; #else U32 TerrainRender::mTextureSlopSize = 220; #endif
Begs the question... why?
Gary (-;
#25
This might be acceptable in your situation, or it might not be.
@Gary: I have no idea why. Probably something obscure and historical. :)
01/13/2007 (5:21 pm)
@Benjamin: It's in number of allocated textures. So having a very large cache means you're willing to burn potentially quite a lot of GPU space on the terrain. For a size of 1024, you're looking at around 64 megs of RAM _just_ for terrain textures, although some of that can get swapped to system memory by the driver (which might or might not be fasts).This might be acceptable in your situation, or it might not be.
@Gary: I have no idea why. Probably something obscure and historical. :)
#26
Gary (-;
01/13/2007 (5:23 pm)
@Ben: Fair 'nuff! Thank-you so much for all yout help in this thread, it's cast a whole new light on my previously limited terrain-in-torque knowlege :-)Gary (-;
#27
Ben
01/13/2007 (5:35 pm)
@Ben & @Gary: Thanks so much! One thing I really appreciate about working with Torque is the community: both "citizens" and GG employees are VERY helpful. Thanks again.Ben
Associate Kyle Carter