Torque3D OpenGL Status
by Luis Anton Rebollo · in Torque 3D Professional · 02/24/2013 (1:33 pm) · 175 replies
I am working with BeamNG in port Torque3D to OpenGL. When finished there will be a merge with GG repository.
I will be updating this thread at least once a week.
Repository: Github.com/BeamNG/Torque3D/tree/dev_linux_opengl
* On Windows you need to check TORQUE_OPENGL option.
The problems of speed it was a few months ago were due to drivers. Sounds like a good OpenGL performance is linked to new cards/drivers. NVidia of Kepler architectures (Geforce 600-700) not sure on AMD.
I appreciate any donation to help the project.
.




I will be updating this thread at least once a week.
Repository: Github.com/BeamNG/Torque3D/tree/dev_linux_opengl
Status:
- Is a research & learn branch, lots of unfinised and ugly code. It will check and clean all code before repo merge with GG
- Basic Lighting & Advanced Lighting render correctly (more or less) all effects.
- There may be errors or differences with other implementations of OpenGL. They will be corrected later.
- No Oculus rift port to glsl.
How to compile:
* Use this manual.* On Windows you need to check TORQUE_OPENGL option.
Reporting bugs:
If possible, Github is a great place to post issues to a forum.Torque 3D Version:
This branch is based on Torque3D 3.5 development.OpenGL performance:
I check the code in a new Gerforce 750ti and T3D works just as fast in Opengl/Directx9.The problems of speed it was a few months ago were due to drivers. Sounds like a good OpenGL performance is linked to new cards/drivers. NVidia of Kepler architectures (Geforce 600-700) not sure on AMD.
I appreciate any donation to help the project.
.



About the author
I'm working on a port of Torque3D to OpenGL and Linux/SteamOS
#82
02/10/2014 (4:12 pm)
Let me know if you need any assistance or additional testing. This is the same issue I'm plagued by and I'm also running an AMD card.
#83
What AMD card have you got there mate?
@Luis:
Email sent. This AMD driver thing i honestly have no proof of this and i have never actually spoken to an AMD driver engineer about it but i suspect they have changed something with those 7900 series because it's just makes no sense while using the exact same catalyst version it works perfectly on one family model and not another.
02/10/2014 (4:24 pm)
@Andrew:What AMD card have you got there mate?
@Luis:
Email sent. This AMD driver thing i honestly have no proof of this and i have never actually spoken to an AMD driver engineer about it but i suspect they have changed something with those 7900 series because it's just makes no sense while using the exact same catalyst version it works perfectly on one family model and not another.
#84
02/10/2014 (4:32 pm)
Oh yeah andrew, can you see if setting the texture filtering to lowest also solves the huge loading time for you, just want to make sure what i'm seeing is not an anomaly.
#85
I actually just did a reformat yesterday ( tip: don't install AMD's profiling tools. They ruined all my visual studio installations after I uninstalled them. ) and I haven't reinstalled linux yet. I should be able to test that later tonight or tomorrow.
Hmm.. are there any popular distros that haven't been tested? I'm always up for a challenge.
02/10/2014 (5:05 pm)
AMD Radeon HD 5700. I actually just did a reformat yesterday ( tip: don't install AMD's profiling tools. They ruined all my visual studio installations after I uninstalled them. ) and I haven't reinstalled linux yet. I should be able to test that later tonight or tomorrow.
Hmm.. are there any popular distros that haven't been tested? I'm always up for a challenge.
#86
02/10/2014 (5:40 pm)
Has slackware been tested before?
#87
02/10/2014 (6:17 pm)
I just scanned through the linux page, I don't see any slackware builds. I assume steam os is debian based so it doesn't look like anyone has strayed too far from that family. I've got slackware 14.1 downloading now, will report back.
#88
Uploading compressed texture data to the GPU with glCompressedTexSubImage2D is abysmally slow, it's just shocking. It's actually faster to decompress with squish and than upload via glTexSubImage2D, not much faster but faster none the less. The reason it speeds up changing the texture quality is this causes T3D to reduce the texture size and also the mip map levels so there the amount of data uploaded is dramatically reduced.
My laptop actually has one of those switchable graphics, so when i switch to the intel card,glCompressedTexSubImage2D uploads like lighting, it's super fast. AMD D3D driver doesn't suffer this problem, only it's dodgy OpenGL driver. I'll keep playing around and see if i can't find a work around for this, maybe it's really picky and wants a specific texture parameter set or something along these lines.
02/10/2014 (7:40 pm)
Ok with this AMD problem here is what is happening:Uploading compressed texture data to the GPU with glCompressedTexSubImage2D is abysmally slow, it's just shocking. It's actually faster to decompress with squish and than upload via glTexSubImage2D, not much faster but faster none the less. The reason it speeds up changing the texture quality is this causes T3D to reduce the texture size and also the mip map levels so there the amount of data uploaded is dramatically reduced.
My laptop actually has one of those switchable graphics, so when i switch to the intel card,glCompressedTexSubImage2D uploads like lighting, it's super fast. AMD D3D driver doesn't suffer this problem, only it's dodgy OpenGL driver. I'll keep playing around and see if i can't find a work around for this, maybe it's really picky and wants a specific texture parameter set or something along these lines.
#89
02/11/2014 (2:57 am)
I confirm the problem of loading times on ATI HD5450 with the latest drivers. No errors on log, no render errors.
#90
I have enabled the AMD_debug_output too as the KHR_debug is not supported on mine. This also reports no errors or any thing else.
02/11/2014 (4:33 am)
I have tried with catalyst 13.1, 13.4 and 14.1(beta) and all suffer the same problem with dreadfully slow glCompressedTexSubImage2D uploads. There is no way this problem has been present in AMD drivers this long without someone else mentioning it, it's got to be some state (or some other previous GL call) that is causing this. Just for testing i tried the samples that ship with Ogre3D 1.9, i enabled the OpenGL plugin and loaded 30 large dds textures and no sign of this problem. I'll have to keep searching to get to the bottom of this.I have enabled the AMD_debug_output too as the KHR_debug is not supported on mine. This also reports no errors or any thing else.
#91
What is happening is the AMD drivers are showing a massive slow up when you pre-allocate doing glTexImage2D(blah,blah,NULL) using any DXTx formats and than later using glCompressedTexSubImage2D to fill in the pixel data(NVidia and Intel have no probs with this). If you skip that initial NULL glTexImage2D when using DXTx format and send the pixel data using glCompressedTexImage2D instead of glCompressedTexSubImage2D, hey presto problem gone.
I'll play around more tomorrow and send a pull request to luis, i need sleep ;-)
*Edit:
I think the main point is, it appears you can't mix glTexImage* with glCompressedTex* on AMD hardware even though this works perfectly on NVidia and Intel hardware. Will confirm this for sure tomorrow.
02/11/2014 (5:06 am)
Ok good news i managed to find the problem :-) :-)What is happening is the AMD drivers are showing a massive slow up when you pre-allocate doing glTexImage2D(blah,blah,NULL) using any DXTx formats and than later using glCompressedTexSubImage2D to fill in the pixel data(NVidia and Intel have no probs with this). If you skip that initial NULL glTexImage2D when using DXTx format and send the pixel data using glCompressedTexImage2D instead of glCompressedTexSubImage2D, hey presto problem gone.
I'll play around more tomorrow and send a pull request to luis, i need sleep ;-)
*Edit:
I think the main point is, it appears you can't mix glTexImage* with glCompressedTex* on AMD hardware even though this works perfectly on NVidia and Intel hardware. Will confirm this for sure tomorrow.
#92
Also I'm curious as to why you didn't make use of the Frame Allocator when allocating the vertex buffer data, since with your current solution there's always going to be a bunch of memory always allocated in system memory which can be a problem with high poly static assets... though I guess a FrameAllocator solution might not be the best idea either in this case as depending on the size of the buffer it might exceed the allocator size.
02/12/2014 (2:50 am)
@Luis I've noticed in the GFXGLVertexBuffer vertex buffer code you make use of glNamedBufferDataEXT, but despite this being mentioned everywhere I can't seem to find any documentation on it. Do you happen to know of any source, or is it just one of those cryptic functions which everyone uses? =/Also I'm curious as to why you didn't make use of the Frame Allocator when allocating the vertex buffer data, since with your current solution there's always going to be a bunch of memory always allocated in system memory which can be a problem with high poly static assets... though I guess a FrameAllocator solution might not be the best idea either in this case as depending on the size of the buffer it might exceed the allocator size.
#93
glNamedBuffer* belongs to Direct State Access extension www.opengl.org/registry/specs/EXT/direct_state_access.txt
@Luis:
I have found the problem once and for all (the AMD slow load) and it's so bloody simple it's annoying i went to effort of creating a workaround and over looked this :-( .... The TextureManager's innerCreateTexture function enables GL_GENERATE_MIPMAP_SGIS without first checking if the texture is compressed & has it's own mip maps. Obviously this a huge no no,make sure GL_GENERATE_MIPMAP_SGIS is disabled for this particular case and loading is fine for AMD users. Most likely this will speed up NVidia loading too (unless they cheat and detect this state is set and ignore it???). So all that crap i mention above about glTexImage* glCompressed* is false.
02/12/2014 (4:08 am)
@James:glNamedBuffer* belongs to Direct State Access extension www.opengl.org/registry/specs/EXT/direct_state_access.txt
@Luis:
I have found the problem once and for all (the AMD slow load) and it's so bloody simple it's annoying i went to effort of creating a workaround and over looked this :-( .... The TextureManager's innerCreateTexture function enables GL_GENERATE_MIPMAP_SGIS without first checking if the texture is compressed & has it's own mip maps. Obviously this a huge no no,make sure GL_GENERATE_MIPMAP_SGIS is disabled for this particular case and loading is fine for AMD users. Most likely this will speed up NVidia loading too (unless they cheat and detect this state is set and ignore it???). So all that crap i mention above about glTexImage* glCompressed* is false.
#94
If I send you the fix can you test it on your machine just to double make sure the problem is solved?
02/12/2014 (5:00 am)
@Andrew:If I send you the fix can you test it on your machine just to double make sure the problem is solved?
#95
By all means, you have my email. I was distracted yesterday by Occlusion Culling but I'm going to put the slackware installer on while I'm in class so when I get back I can jump right in.
02/12/2014 (5:28 am)
@Timmy:By all means, you have my email. I was distracted yesterday by Occlusion Culling but I'm going to put the slackware installer on while I'm in class so when I get back I can jump right in.
#96
Ah thanks, I forgot to look for the name minus the prefix. So it's the direct equivalent of glBufferData/glBufferSubData. Was just wondering if it did anything else mysterious... I know some engines (e.g. ogre) take a different approach and use a combination of glBufferData and glMapBuffer once a threshold has been passed. Was just trying to understand the implications behind the current implementation ;)
02/12/2014 (5:57 am)
@TimmyAh thanks, I forgot to look for the name minus the prefix. So it's the direct equivalent of glBufferData/glBufferSubData. Was just wondering if it did anything else mysterious... I know some engines (e.g. ogre) take a different approach and use a combination of glBufferData and glMapBuffer once a threshold has been passed. Was just trying to understand the implications behind the current implementation ;)
#97
In theory it should be slightly faster using DSA stuff because it's designed to bypass the state selector and you are also combining multiple function calls into one. Obviously the gl device is not yet finished so it's missing a lot DSA functions in other parts of the app like texture loading.
I actually did some testing the other day on NVidia card, i noticed glMapBuffer was far slower than glBufferSubData method but glMapBufferRange together with the invalidate flag was the fastest method overall. This could likely be different on other GPU's. It's does make sense what you mention about ogre, i guess the tricky part is finding that nice threshold balance from card to card. Than again i guess that's what the card profile stuff is for.
The vertex buffer class is heavily used and any improvements there would certainly be a good addition, I was going to implement GL_NV_vertex_buffer_unified_memory (part of the nvidia bindless stuff) and see if any improvements come from that (obviously NVidia only though).
02/12/2014 (6:58 am)
@JamesIn theory it should be slightly faster using DSA stuff because it's designed to bypass the state selector and you are also combining multiple function calls into one. Obviously the gl device is not yet finished so it's missing a lot DSA functions in other parts of the app like texture loading.
I actually did some testing the other day on NVidia card, i noticed glMapBuffer was far slower than glBufferSubData method but glMapBufferRange together with the invalidate flag was the fastest method overall. This could likely be different on other GPU's. It's does make sense what you mention about ogre, i guess the tricky part is finding that nice threshold balance from card to card. Than again i guess that's what the card profile stuff is for.
The vertex buffer class is heavily used and any improvements there would certainly be a good addition, I was going to implement GL_NV_vertex_buffer_unified_memory (part of the nvidia bindless stuff) and see if any improvements come from that (obviously NVidia only though).
#98
Find the function innerCreateTexture in gfxGLTextureManager and replace the appropriate code with this
Note this needs further modifying because it only checks DXT1 format but will suffice for testing because i'm pretty sure only dxt1 compression is used in the full/empty templates dds files ;-)
02/12/2014 (7:02 am)
@ Andrew:Find the function innerCreateTexture in gfxGLTextureManager and replace the appropriate code with this
if(forceMips && !retTex->mIsNPoT2)
{
glTexParameteri(binding, GL_GENERATE_MIPMAP_SGIS, GL_TRUE);
retTex->mMipLevels = 0;
}
else if(profile->testFlag(GFXTextureProfile::NoMipmap) || profile->testFlag(GFXTextureProfile::RenderTarget) || numMipLevels == 1 || retTex->mIsNPoT2)
{
retTex->mMipLevels = 1;
}
else if(format != GFXFormatDXT1)
{
glTexParameteri(binding, GL_GENERATE_MIPMAP_SGIS, GL_TRUE);
retTex->mMipLevels = 0;
}
else
{
glTexParameteri(binding, GL_GENERATE_MIPMAP_SGIS, GL_FALSE);
retTex->mMipLevels = numMipLevels;
}Note this needs further modifying because it only checks DXT1 format but will suffice for testing because i'm pretty sure only dxt1 compression is used in the full/empty templates dds files ;-)
#99
Thx to all for review and comment code, your reviews will be very useful.
@Timmy, good work with AMD problem :D Nvidia bindless extensions are very promising, you will have fun.
@James, you are right. Currently the vertex buffer consuming an unnecessary memory. Maybe would be a good idea to use DataChunker or something. Added to my ToDo.
My plan for OpenGL:
I would like people check different ways of managing the FBO for RT. Currently an FBO is used for each RT (recommendation from Valve), but my card is faster change "attachments texture."
Ideas for the test/benchmark?
02/12/2014 (5:30 pm)
New update on Github: github.com/LuisAntonRebollo/Torque3D/tree/dev_linux_opengl- OpenGL Fix: Inclomplete and wrong texture enums T3D->GL in gfxGLEnumTranslate.cpp.
Thx to all for review and comment code, your reviews will be very useful.
@Timmy, good work with AMD problem :D Nvidia bindless extensions are very promising, you will have fun.
@James, you are right. Currently the vertex buffer consuming an unnecessary memory. Maybe would be a good idea to use DataChunker or something. Added to my ToDo.
My plan for OpenGL:
- Make it work.
- Make it clean.
- Make it fast.
I would like people check different ways of managing the FBO for RT. Currently an FBO is used for each RT (recommendation from Valve), but my card is faster change "attachments texture."
Ideas for the test/benchmark?
#100
Just finishing up a few small changes and i'll send a pull request your way for this amd problem.
02/12/2014 (5:35 pm)
@Luis:Just finishing up a few small changes and i'll send a pull request your way for this amd problem.
Torque 3D Owner Luis Anton Rebollo
Send me an email and we will talk about how to coordinate the work. Thank you very much.