Torque3D OpenGL Status
by Luis Anton Rebollo · in Torque 3D Professional · 02/24/2013 (1:33 pm) · 175 replies
I am working with BeamNG in port Torque3D to OpenGL. When finished there will be a merge with GG repository.
I will be updating this thread at least once a week.
Repository: Github.com/BeamNG/Torque3D/tree/dev_linux_opengl
* On Windows you need to check TORQUE_OPENGL option.
The problems of speed it was a few months ago were due to drivers. Sounds like a good OpenGL performance is linked to new cards/drivers. NVidia of Kepler architectures (Geforce 600-700) not sure on AMD.
I appreciate any donation to help the project.
.




I will be updating this thread at least once a week.
Repository: Github.com/BeamNG/Torque3D/tree/dev_linux_opengl
Status:
- Is a research & learn branch, lots of unfinised and ugly code. It will check and clean all code before repo merge with GG
- Basic Lighting & Advanced Lighting render correctly (more or less) all effects.
- There may be errors or differences with other implementations of OpenGL. They will be corrected later.
- No Oculus rift port to glsl.
How to compile:
* Use this manual.* On Windows you need to check TORQUE_OPENGL option.
Reporting bugs:
If possible, Github is a great place to post issues to a forum.Torque 3D Version:
This branch is based on Torque3D 3.5 development.OpenGL performance:
I check the code in a new Gerforce 750ti and T3D works just as fast in Opengl/Directx9.The problems of speed it was a few months ago were due to drivers. Sounds like a good OpenGL performance is linked to new cards/drivers. NVidia of Kepler architectures (Geforce 600-700) not sure on AMD.
I appreciate any donation to help the project.
.



About the author
I'm working on a port of Torque3D to OpenGL and Linux/SteamOS
#142
Soon I will work on Linux until achieving a stable release.
I guess when T3D work properly in SteamOS, I will start working on OSX version.
03/30/2014 (3:54 pm)
@Stephan.Soon I will work on Linux until achieving a stable release.
I guess when T3D work properly in SteamOS, I will start working on OSX version.
#143
Have you tried using HLSL2GLSL? You could try directly converting DirectX9 to OpenGL. Or maybe you could use this when you port it to OpenGL ES? Just a thought, it could help a lot.
04/05/2014 (1:56 pm)
Hey Luis,Have you tried using HLSL2GLSL? You could try directly converting DirectX9 to OpenGL. Or maybe you could use this when you port it to OpenGL ES? Just a thought, it could help a lot.
#144
Funny you should mention that, I tried using HLSL2GLSL myself but found it pretty awful TBH as it failed with most of the stock t3d shaders from what I could tell. In the end it was much better to just use GLSL, despite the duplication.
In endzone we ended up merging the shadergen features so the duplication wasn't as much of a problem.
04/06/2014 (12:24 pm)
@raaFunny you should mention that, I tried using HLSL2GLSL myself but found it pretty awful TBH as it failed with most of the stock t3d shaders from what I could tell. In the end it was much better to just use GLSL, despite the duplication.
In endzone we ended up merging the shadergen features so the duplication wasn't as much of a problem.
#145
04/06/2014 (1:07 pm)
Well, I was thinking about using it for the iOS and Android port. They use OpenGL ES and it would help a lot to get those shaders to OpenGL ES.
#146
04/06/2014 (4:57 pm)
It probably isn't worth the time anyway. converting the GLSL shaders to opengl es 2.0 should take less time considering most of torque's shaders are created with shadergen. So you have to add a opengl es 2.0 layer to shadergen anyway.
#147
04/15/2014 (10:35 pm)
Interesting read here luis if you haven't already read it www.geeks3d.com/20140321/opengl-approaching-zero-driver-overhead/
#148
I intend to check out the GFX layer to integrate these ideas.
04/16/2014 (12:11 pm)
I had already read, lots of interesting information.I intend to check out the GFX layer to integrate these ideas.
#149
1) Vertex Array Objects
Implementing these seems to help framerate SLIGHTLY. In most cases with static geometry you can reduce 2-3 glVertexPointer calls to a single glBindVertexArray call.
NOTE since we don't use instancing (yet) the VAO code doesn't support it. Shouldn't be too hard though.
2) FrameBuffer Objects
When rendering PostFX, the GFX device was retaining textures which were no longer being used. This causes unnecessary reconfiguration of associated FrameBuffer objects. To solve this, unused texture units had to be set to NULL.
3) Texture Units
Since I cannot use sampler objects, I had to add some extra code when setting the sampler in order to eliminate unnecessary sampler map changes on the associated texture object (e.g. if you set the sampler THEN set the texture, it will usually change things twice).
4) Dynamic vertex buffers
I have been experimenting with refactoring the volatile buffer implementation to act more like the Direct3D version (i.e. using a single buffer and mapping parts of it), but I have yet to get it to work properly. Likely I will be revisiting this at some point in the near future. From what I have read though this could give a nice performance boost in some areas (e.g. particle effects, gui rendering).
5) State errors
Some things such as the blend mode are set to invalid values when blending is disabled.
6) Triangle Strips vs Triangle Fans
I noticed a drop in the time spent in the drawPrimitives call when changing the PostFX render to use a triangle fan. This was on an iMac with an NVIDIA chipset, so other drivers may vary.
7) Rendering Order
One big problem in Torque3D currently is the material system sorts materials very poorly. This means that often you can get a rendering order such as:
MESH 1 (Shader 5)
MESH 2 (Shader 3)
MESH 3 (Shader 5)
Which results in OpenGL switching from shader 5 to 3, and back to 5 again. From what I've read and observed, switching shaders when rendering is one of the worst things you can do, at least in your typical OpenGL driver. The current sorting code sorts by a hash derived from the material id and combined states, which is about as good as a random sort most of the time unless all of your meshes use the same material settings.
If you thought that was bad, lets consider what happens when you factor in material passes. Every pass in a material has the potential to use a different shader, and all of these passes are rendered at the same time. So for instance this can happen.
MESH 1 PASS 1 (Shader 5)
MESH 2 PASS 1 (Shader 8)
MESH 2 PASS 2 (Shader 3)
MESH 3 PASS 1 (Shader 5)
MESH 3 PASS 2 (Shader 8)
In essence... even more shader changes! As you can imagine, this is not a good thing for performance. e.g. in endzone we use a second pass for glow extensively - cutting out the second pass practically doubled the frame rate!
I've developed a solution for the first problem which incorporates a shader id into the sort hint which seems to be showing promising results (at least I can't see as many stupid switches anymore).
I'm also currently experimenting with changing the render order so it defers rendering later passes until it has finished rendering everything in the current pass, which seems to be helping in some cases. Unfortunately it's not a universal solution though as sometimes a pass will use the same shader as a previous pass, so deferring will result in more switches. Another possible solution would be to just duplicate the render data for each pass, but that would be a bit more tricky and would require more changes to the sceneobject code.
In any case there are plenty of other problems to solve! I will see if I can contribute some of this stuff when I am finished.
04/23/2014 (5:30 pm)
Just thought I'd mention a few observations I've noted when working on with the OpenGL code for endzone, in case anyone is curious...1) Vertex Array Objects
Implementing these seems to help framerate SLIGHTLY. In most cases with static geometry you can reduce 2-3 glVertexPointer calls to a single glBindVertexArray call.
NOTE since we don't use instancing (yet) the VAO code doesn't support it. Shouldn't be too hard though.
2) FrameBuffer Objects
When rendering PostFX, the GFX device was retaining textures which were no longer being used. This causes unnecessary reconfiguration of associated FrameBuffer objects. To solve this, unused texture units had to be set to NULL.
3) Texture Units
Since I cannot use sampler objects, I had to add some extra code when setting the sampler in order to eliminate unnecessary sampler map changes on the associated texture object (e.g. if you set the sampler THEN set the texture, it will usually change things twice).
4) Dynamic vertex buffers
I have been experimenting with refactoring the volatile buffer implementation to act more like the Direct3D version (i.e. using a single buffer and mapping parts of it), but I have yet to get it to work properly. Likely I will be revisiting this at some point in the near future. From what I have read though this could give a nice performance boost in some areas (e.g. particle effects, gui rendering).
5) State errors
Some things such as the blend mode are set to invalid values when blending is disabled.
6) Triangle Strips vs Triangle Fans
I noticed a drop in the time spent in the drawPrimitives call when changing the PostFX render to use a triangle fan. This was on an iMac with an NVIDIA chipset, so other drivers may vary.
7) Rendering Order
One big problem in Torque3D currently is the material system sorts materials very poorly. This means that often you can get a rendering order such as:
MESH 1 (Shader 5)
MESH 2 (Shader 3)
MESH 3 (Shader 5)
Which results in OpenGL switching from shader 5 to 3, and back to 5 again. From what I've read and observed, switching shaders when rendering is one of the worst things you can do, at least in your typical OpenGL driver. The current sorting code sorts by a hash derived from the material id and combined states, which is about as good as a random sort most of the time unless all of your meshes use the same material settings.
If you thought that was bad, lets consider what happens when you factor in material passes. Every pass in a material has the potential to use a different shader, and all of these passes are rendered at the same time. So for instance this can happen.
MESH 1 PASS 1 (Shader 5)
MESH 2 PASS 1 (Shader 8)
MESH 2 PASS 2 (Shader 3)
MESH 3 PASS 1 (Shader 5)
MESH 3 PASS 2 (Shader 8)
In essence... even more shader changes! As you can imagine, this is not a good thing for performance. e.g. in endzone we use a second pass for glow extensively - cutting out the second pass practically doubled the frame rate!
I've developed a solution for the first problem which incorporates a shader id into the sort hint which seems to be showing promising results (at least I can't see as many stupid switches anymore).
I'm also currently experimenting with changing the render order so it defers rendering later passes until it has finished rendering everything in the current pass, which seems to be helping in some cases. Unfortunately it's not a universal solution though as sometimes a pass will use the same shader as a previous pass, so deferring will result in more switches. Another possible solution would be to just duplicate the render data for each pass, but that would be a bit more tricky and would require more changes to the sceneobject code.
In any case there are plenty of other problems to solve! I will see if I can contribute some of this stuff when I am finished.
#150
At present, also seem to have lost ctrl+C/V and delete functionality, as well as an ap crash in directx, and hang in opengl. Presumably from the sdl2 integration, but I have fallen behind on my reviews, so can't be 100% sure when that creeped in.
Clean Build of: https://github.com/BeamNG/Torque3D/commits/dev_linux_opengl Apr 21, 2014, win8.1, vs2008.
For crosschecking purposes, what point do you consider stable prior to the sdl step?
04/28/2014 (11:27 am)
When you get a chance, might want to plug in:new Path() {
isLooping = "1";
canSave = "1";
canSaveDynamicFields = "1";
new Marker() {
seqNum = "0";
type = "Normal";
msToNext = "1000";
smoothingType = "Spline";
position = "-2.09203 -4.85555 244.933";
rotation = "1 0 0 0";
scale = "1 1 1";
canSave = "1";
canSaveDynamicFields = "1";
};
new Marker() {
seqNum = "1";
type = "Normal";
msToNext = "1000";
smoothingType = "Spline";
position = "1.83076 11.7385 246.184";
rotation = "1 0 0 0";
scale = "1 1 1";
canSave = "1";
canSaveDynamicFields = "1";
};
new Marker() {
seqNum = "2";
type = "Normal";
msToNext = "1000";
smoothingType = "Spline";
position = "-13.0555 13.4994 247.052";
rotation = "1 0 0 0";
scale = "1 1 1";
canSave = "1";
canSaveDynamicFields = "1";
};
new Marker() {
seqNum = "3";
type = "Normal";
msToNext = "1000";
smoothingType = "Spline";
position = "-2.98541 -5.11887 260.504";
rotation = "1 0 0 0";
scale = "1 1 1";
canSave = "1";
canSaveDynamicFields = "1";
};
};To Empty Terrain.mis.(Looks like more of the same matrix malformations at a glance at least, given that the 'ghost nodes' show up fine directx side.)At present, also seem to have lost ctrl+C/V and delete functionality, as well as an ap crash in directx, and hang in opengl. Presumably from the sdl2 integration, but I have fallen behind on my reviews, so can't be 100% sure when that creeped in.
Clean Build of: https://github.com/BeamNG/Torque3D/commits/dev_linux_opengl Apr 21, 2014, win8.1, vs2008.
For crosschecking purposes, what point do you consider stable prior to the sdl step?
#151
@James
Great your post, I respond when I return in a few days. I've been a little off these days, sorry.
@Azaezel
I will check the problems you say.
04/30/2014 (9:41 am)
I'll be a few days without internet connection.@James
Great your post, I respond when I return in a few days. I've been a little off these days, sorry.
@Azaezel
I will check the problems you say.
#152
3) Texture Units
I am in the process of adding support for texture arrays and bindless textures to reduce the changes of texture and merge render draws. Then will come multi draw indirect.
4) Dynamic vertex buffers
I have to finish an implementation of volatile ring/circular buffer that I have started. I guess I'll upload soon.
7) Rendering Order
Yes. It is also giving me a lot of trouble to me. We need to sort of materials more intelligently. This week we'll play a little this topic. I'll tell you my findings later.
@Azaezel
I have fixed missing path nodes.(in Github)
I was unable to reproduce your problem with ctrl + v, ctrl + c or delete. I have tested on Windows, Windows + SDL2, on Linux.
Can you give me some more information about the problem and crash/hang?
Sorry, there is nothing that can be considered stable. But if you report me I will try to solve your problems out without delay. Thank you very much for your reports
05/04/2014 (12:34 pm)
@James3) Texture Units
I am in the process of adding support for texture arrays and bindless textures to reduce the changes of texture and merge render draws. Then will come multi draw indirect.
4) Dynamic vertex buffers
I have to finish an implementation of volatile ring/circular buffer that I have started. I guess I'll upload soon.
7) Rendering Order
Yes. It is also giving me a lot of trouble to me. We need to sort of materials more intelligently. This week we'll play a little this topic. I'll tell you my findings later.
@Azaezel
I have fixed missing path nodes.(in Github)
I was unable to reproduce your problem with ctrl + v, ctrl + c or delete. I have tested on Windows, Windows + SDL2, on Linux.
Can you give me some more information about the problem and crash/hang?
Sorry, there is nothing that can be considered stable. But if you report me I will try to solve your problems out without delay. Thank you very much for your reports
#153
05/04/2014 (12:45 pm)
@luis: I'll have a followup on that within the week. Only additional data I can provide at present is that after regenerating the project with sdl flipped off for cmake, it resolved it's self. Quite likely I flipped an additional SDL2-specific option on as well (Which I do realize is not terribly helpful.)
#154
I've made significant gains (i.e. doubled the framerate) by batching together by shader instances and deferring passes where shader switching isn't optimal. Though TBH I think the multipass layer approach the material system uses is a bit of a train wreck when it comes to performance. Would be better if the system was redesigned so layers could either be rendered in a single sweep or through multiple calls using the same shader, which would also help with instancing.
Of course it still doesn't help when transparent materials need to be sorted. Very annoying.
05/06/2014 (4:31 pm)
@AntonI've made significant gains (i.e. doubled the framerate) by batching together by shader instances and deferring passes where shader switching isn't optimal. Though TBH I think the multipass layer approach the material system uses is a bit of a train wreck when it comes to performance. Would be better if the system was redesigned so layers could either be rendered in a single sweep or through multiple calls using the same shader, which would also help with instancing.
Of course it still doesn't help when transparent materials need to be sorted. Very annoying.
#155
alterations this end: SDL_windows_main.c:
temporarily altered
gfxD3D9PCDeviceProfiler.cpp:
#include <d3d.h> to #include <d3d9.h> for compilation purposes with the older compiler and directx combo.
debug build, 05May2014,vs2008, win8.1
Want the branch-specific bug in the branch, and the directx reference one tagged on the old repo report-wise, to keep which bugs are what type straight?
05/09/2014 (7:01 pm)
followup, since I said I'd do one this week (though not as thorough as I'd like still in terms of a proper report) crash on exit application:stack-trace:
> Test.exe!SignalBaseT<void __cdecl(void)>::remove(fastdelegate::FastDelegate<void __cdecl(void)> dlg={...}) Line 136 C++
Test.exe!PlatformWindowManagerSDL::~PlatformWindowManagerSDL() Line 69 C++
Test.exe!PlatformWindowManagerSDL::`scalar deleting destructor'() + 0x8 bytes C++
Test.exe!`dynamic atexit destructor for 'smWindowManager''() + 0x12 bytes C++
Test.exe!doexit(int code=0, int quick=0, int retcaller=0) Line 591 C
Test.exe!exit(int code=0) Line 412 + 0xc bytes C
Test.exe!__tmainCRTStartup() Line 272 Ccash on exit mission via editor:stack-trace:
{> Test.exe!GuiMenuBar::findMenu(const char * menu=0x0cdc4c58) Line 828 C++
Test.exe!MenuBar::updateMenuBar(PopupMenu * popupMenu=0x00000000) Line 117 + 0xb bytes C++
Test.exe!MenuBar::removeObject(SimObject * obj=0x0d7d3ad0) Line 67 + 0x1f bytes C++
Test.exe!cm_SimSet_remove(SimSet * object=0x12e84c08, int argc=3, const char * * argv=0x01f6f5c0) Line 936 C++alterations this end: SDL_windows_main.c:
temporarily altered
#if defined(_MSC_VER) /* The VC++ compiler needs main defined */ #define console_main main #endifto
#if defined(_MSC_VER) /* The VC++ compiler needs main defined */ #define console_main _main #endiftill I can dig out a proper fix.
gfxD3D9PCDeviceProfiler.cpp:
#include <d3d.h> to #include <d3d9.h> for compilation purposes with the older compiler and directx combo.
debug build, 05May2014,vs2008, win8.1
Want the branch-specific bug in the branch, and the directx reference one tagged on the old repo report-wise, to keep which bugs are what type straight?
#156
Use GL_ARB_buffer_storage when posible, or glBufferSubData. I want to add the option of use glMapBufferRange on Intel hardware. I hope to upload the code this weekend.
@Azaezel
Thanks for the reports, you can find the fixes in Github: github.com/BeamNG/Torque3D/tree/dev_linux_opengl
About SDL_windows_main.c: This problem is more complicated, I do not want to modify external libraries. I'll try to find some way to fix it.
05/14/2014 (11:16 am)
@Jamescircular volatile buffer:
I have completed the circular volatile buffer. Work on vertex and index buffers.Use GL_ARB_buffer_storage when posible, or glBufferSubData. I want to add the option of use glMapBufferRange on Intel hardware. I hope to upload the code this weekend.
Sort Material:
I have also started working on sort materials for render, I try to upload soon so we can compare.Quote:I think the same.
Would be better if the system was redesigned so layers could either be rendered in a single sweep or through multiple calls using the same shader, which would also help with instancing.
@Azaezel
Thanks for the reports, you can find the fixes in Github: github.com/BeamNG/Torque3D/tree/dev_linux_opengl
About SDL_windows_main.c: This problem is more complicated, I do not want to modify external libraries. I'll try to find some way to fix it.
#157
I'll see if I can get any of my useful OpenGL changes out as they are currently mostly written for endzone.
05/14/2014 (12:19 pm)
@Anton sounds good! I was working on a volatile buffer myself but I'm currently caught up in some other areas so will be interesting to take a peek at what you cooked up. I might need to use a slightly more expansive solution though as I am currently stuck targeting OpenGL 2.x on apple platforms (uugh).I'll see if I can get any of my useful OpenGL changes out as they are currently mostly written for endzone.
#158
As far as it goes, I *think* at this point the only thing left from the baseline list that doesn't at least have a commit showing the steps for folks to convert would be normalmaps. Still not quite sure whether that's lurking in the advancedLighting, or materialfeatures code, though quick insertions of "error" in the processpix shader command insertion entries did show corruption at say, after detecting SM3 and the like, so they are getting to that point.
05/14/2014 (1:29 pm)
@Luis: yeah, that's why I explicitly mentioned that one as a temp. As far as it goes, I *think* at this point the only thing left from the baseline list that doesn't at least have a commit showing the steps for folks to convert would be normalmaps. Still not quite sure whether that's lurking in the advancedLighting, or materialfeatures code, though quick insertions of "error" in the processpix shader command insertion entries did show corruption at say, after detecting SM3 and the like, so they are getting to that point.
#159
Not sure where @Luis's circular buffer implementation went to, but I've been writing my own for now. It seems at least on ATI cards using glMapBufferRange can really boost frame rate - for instance I was getting ~30 fps in our main menu before, while now we're getting more like 143. This is similar to what I was getting on an Nvidia card on OSX just using glMapBuffer & discarding the buffer.
Next step for me at least is to see if how GL_ARB_buffer_storage & pinned memory compares. Of course all this still largely depends on what driver you are using, which is frustrating to say the least!
05/22/2014 (11:33 am)
Just an update on my observations.Not sure where @Luis's circular buffer implementation went to, but I've been writing my own for now. It seems at least on ATI cards using glMapBufferRange can really boost frame rate - for instance I was getting ~30 fps in our main menu before, while now we're getting more like 143. This is similar to what I was getting on an Nvidia card on OSX just using glMapBuffer & discarding the buffer.
Next step for me at least is to see if how GL_ARB_buffer_storage & pinned memory compares. Of course all this still largely depends on what driver you are using, which is frustrating to say the least!
#160
I have read that Intel does not like glBufferSubData, i use only with Nvidia.
New update on Github: github.com/BeamNG/Torque3D/tree/dev_linux_opengl
Next in roadmap:
Merge some Anis changes. Fix render errors Fix gui errors
05/23/2014 (1:10 am)
@James, I'm sorry, I've been so busy these days that I could not finish/upload it before :(I have read that Intel does not like glBufferSubData, i use only with Nvidia.
New update on Github: github.com/BeamNG/Torque3D/tree/dev_linux_opengl
Next in roadmap:
Torque Owner Stephen
GearedMind Studio
Not sure where to ask so I figured I would ask here since this is the most active. How's the cross platform of Torque 3D going?