Threads and atomic reading of data
by Demolishun · in Torque 3D Professional · 12/13/2012 (7:46 pm) · 11 replies
Okay, so I am working on adding a threaded feature to T3D. It will allow the engine to grab audio data from the loopback feature enabled by Vista and Windows 7. Earlier versions might support this through hardware drivers.
Anyway, I am wanting to have the actual acquisition of the audio data occur in a thread. I have figured out how to create a Thread class, control it, etc. What I want to do is set variables accessible to the main thread through a function call. However, if possible I want to be able to keep the main thread from having to wait for a semaphore to be acquired. I would like to be able to read the values (analog values as F32) of the filtered data. I know if the read were to be atomic then it should not matter if the other thread updates the data while the main thread is reading each value in sequence. Each read would be atomic and there would never be a partial update of any specific variable. Would this involve special compiler flags, or use of assembler to guarantee this?
The reason I want to not have the main thread wait for a semaphore? This should eliminate any possibility for a deadlock condition.
Anyway, I am wanting to have the actual acquisition of the audio data occur in a thread. I have figured out how to create a Thread class, control it, etc. What I want to do is set variables accessible to the main thread through a function call. However, if possible I want to be able to keep the main thread from having to wait for a semaphore to be acquired. I would like to be able to read the values (analog values as F32) of the filtered data. I know if the read were to be atomic then it should not matter if the other thread updates the data while the main thread is reading each value in sequence. Each read would be atomic and there would never be a partial update of any specific variable. Would this involve special compiler flags, or use of assembler to guarantee this?
The reason I want to not have the main thread wait for a semaphore? This should eliminate any possibility for a deadlock condition.
About the author
I love programming, I love programming things that go click, whirr, boom. For organized T3D Links visit: http://demolishun.com/?page_id=67
#2
I am using the Thread class from T3D. It is a multipurpose base class for threading. There are a bunch of other thread types that inherit from that. The code is written so that it is platform independent. So thankfully I don't have to implement that. I also saw it mentioned that there would need to be "significant" changes to T3D for it to be a 64 bit program.
I am only asking about assembler to keep the read operation of a value atomic. However, I think I found some literature from MSDN that basically says any volatile read/write operation on a 32 bit value is atomic on 32/64 bit systems. Any volatile read/write operation on a 64 bit value is only atomic on 64 bit systems. These are only true if the memory is aligned to the proper byte boundaries. As far as I know you have to do something special for the data to not be aligned properly. If I remember right there is a compiler flag to make memory non-byte aligned, but it is typically not used.
Yeah, I will make it pretty solid before I release any code for this. I will make some good comments as it could serve as a good/bad example of threading in T3D. It could be a good reference. Thanks :)
12/14/2012 (10:45 am)
@Scott,I am using the Thread class from T3D. It is a multipurpose base class for threading. There are a bunch of other thread types that inherit from that. The code is written so that it is platform independent. So thankfully I don't have to implement that. I also saw it mentioned that there would need to be "significant" changes to T3D for it to be a 64 bit program.
I am only asking about assembler to keep the read operation of a value atomic. However, I think I found some literature from MSDN that basically says any volatile read/write operation on a 32 bit value is atomic on 32/64 bit systems. Any volatile read/write operation on a 64 bit value is only atomic on 64 bit systems. These are only true if the memory is aligned to the proper byte boundaries. As far as I know you have to do something special for the data to not be aligned properly. If I remember right there is a compiler flag to make memory non-byte aligned, but it is typically not used.
Yeah, I will make it pretty solid before I release any code for this. I will make some good comments as it could serve as a good/bad example of threading in T3D. It could be a good reference. Thanks :)
#3
A suggestion is to make a DLL since it's in C++ and may be easier for you to demonstrate, unless your just as literate in Assembler but most people are not.
A DLL is pretty fast I think but if it's not enough then I'd go the extra miles and do Assembler.. heavily commented of course.
I don't think your concept or the way you plan to implement it is bad at all.
I can see many uses for it such as Triggers for player interaction, Animation choices based on the music choice, Schedules can be changed which depends on the music length or pitch.. I could even imagine your concept being used for Cut Scenes... the list goes on for just about anything I can imagine in T3D.
The whole concept is exciting to think about the possibilities.
12/14/2012 (11:01 am)
I like how you think Frank.A suggestion is to make a DLL since it's in C++ and may be easier for you to demonstrate, unless your just as literate in Assembler but most people are not.
A DLL is pretty fast I think but if it's not enough then I'd go the extra miles and do Assembler.. heavily commented of course.
I don't think your concept or the way you plan to implement it is bad at all.
I can see many uses for it such as Triggers for player interaction, Animation choices based on the music choice, Schedules can be changed which depends on the music length or pitch.. I could even imagine your concept being used for Cut Scenes... the list goes on for just about anything I can imagine in T3D.
The whole concept is exciting to think about the possibilities.
#4
12/14/2012 (6:28 pm)
Yes, it sounds like a lot of fun. I am thinking it will be mainly client side. If this is for a multiplayer game then each player would be listening to different music. So in multiplayer it would be for decoration. However, for single player it could reasonably affect game play. Lots of possibilities here!
#5
12/15/2012 (12:31 pm)
I think that in theory, reading shared data should be atomic. Though from what I've read, CPU caching and stuff makes that not necessarily the case.Quote:The reason I want to not have the main thread wait for a semaphore? This should eliminate any possibility for a deadlock condition.Good code design eliminates deadlock ;).
#6
I am not sure about that last bit on good code design. I read some articles about experts having running apps that run for long periods of time and supposedly have good design behind them. They are still seeing deadlocks crop up. I don't know the context behind their work though.
I agree good design can help eliminate them. Thus the question about atomic operations. :)
12/15/2012 (4:52 pm)
Can you explain how CPU caching could cause a problem? I have never heard of that in regards to threads. If that is true then semaphores would have the potential to fail. Reading/setting semaphores must be an atomic operation for them to work.I am not sure about that last bit on good code design. I read some articles about experts having running apps that run for long periods of time and supposedly have good design behind them. They are still seeing deadlocks crop up. I don't know the context behind their work though.
I agree good design can help eliminate them. Thus the question about atomic operations. :)
#7
I could hardly find any info specifically on atomic reads... apparently that's not something people usually need to do :P. For what it's worth, the worst that could happen is you'd read a funny value - you won't see deadlocks or crashes from a non-atomic read.
12/16/2012 (3:01 am)
I don't understand CPU caches, but this question and this one indicated it may be the cause of problems. To hazard a guess, I'd say that you can't verify that a cache is unique for each thread, so threads may be sharing and polluting each others' caches.I could hardly find any info specifically on atomic reads... apparently that's not something people usually need to do :P. For what it's worth, the worst that could happen is you'd read a funny value - you won't see deadlocks or crashes from a non-atomic read.
#8
msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.80%29.aspx
It turns out that Microsofts "volatile" keyword is a full fledged "memory barrier" and that is why they claim it will allow atomic reads/writes with values declared as such. However, the C++ standard does NOT treat volatile this way. So it is not cross platform to declare a variable volatile and expect it to allow the read/write operations to be atomic. In this case I am writing a windows specific feature. So I will make sure I mark this as a windows only feature for now. It will have to be rewritten to get the same feature for a *nix system anyway.
You are absolutely right on the consequences too. It really depends on what I use the values for. I could put in a semaphore with a short timeout, or have it skip the read until it can grab the semaphore. The code that updates the values is IO bound and will be waiting most of the time so it will not be an issue to miss an update and just grab old values or something.
I will try a few variations of the above and see what happens. The semaphores have the "no wait" option so maybe that is better.
12/16/2012 (12:18 pm)
I looked through the stack overflow link and found a link to this:msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.80%29.aspx
It turns out that Microsofts "volatile" keyword is a full fledged "memory barrier" and that is why they claim it will allow atomic reads/writes with values declared as such. However, the C++ standard does NOT treat volatile this way. So it is not cross platform to declare a variable volatile and expect it to allow the read/write operations to be atomic. In this case I am writing a windows specific feature. So I will make sure I mark this as a windows only feature for now. It will have to be rewritten to get the same feature for a *nix system anyway.
You are absolutely right on the consequences too. It really depends on what I use the values for. I could put in a semaphore with a short timeout, or have it skip the read until it can grab the semaphore. The code that updates the values is IO bound and will be waiting most of the time so it will not be an issue to miss an update and just grab old values or something.
I will try a few variations of the above and see what happens. The semaphores have the "no wait" option so maybe that is better.
#9
12/16/2012 (10:22 pm)
Wow, I found this in T3D:/// Performs an atomic read operation.
inline U32 dAtomicRead( volatile U32 &ref )
{
return _InterlockedExchangeAdd( ( volatile long* )&ref, 0 );
}So T3D is doing an atomic add to perform an atomic read. Sounds like overkill a bit. Hmmm, there is a 64 bit version of the add too. It is called _InterlockedExchangeAdd64. So apparently this would be another way to do this.
#10
12/16/2012 (10:47 pm)
Yeah, I did see that as a suggested solution in one of the SO threads actually. I doubt it would hurt your performance that much... might as well go for it!
#11
12/27/2012 (2:47 am)
My rewrite has gotten more complex and I was unsure how to use volatile. So I searched some more about volatile and I am finding that mutexes and semaphores should be taking care of issues with caching. It turns out volatile may not be preventing the right optimizations to be useful for data access. So I am going to try and remove the volatile modifier and only rely on thread mutexes and semaphores.
Torque 3D Owner Scott Warren
Are you considering a separate thread on a single core that is time sliced as windows does it, or are you implementing the use of Multi-core cpu's ?
The difference will mean the use of Assembler for the former or a DLL for the latter.
Actually, the use of assembler would be ultimate to reduce program bloat and increase speed but the implementation of the assembler is different depending on what VC++ compiler your using.
For VC++ 2010 express the assembler files are linked as separate files, while VC++ 2008 can use Inline assembler code.
Minor changes in the compiler if your making the thread 32 bit or 64 bit.By this I mean "Name Mangling".
Just something to consider if your releasing the code for public consumption. Or disregard what I said since I haven't programmed anything in quite a long time.