Game Development Community

Integrated Voice Recording / Communication

by James Couzens · in Technical Issues · 02/15/2003 (8:03 pm) · 15 replies

I recall that tribes2 had in-game voice comm, and yet, looking though the code, I can't for the life of me find it, so i'm under the assumption that for whatever reason it has been pulled. Probably the entire audio layer pulled, and replaced with OpenAL.

I would like to integrate in game voice recording with a couple of options:

1) Ability to speak to Entire Team
2) Ability to speak to a group, or individual memebers of your Team
3) Ability to speak to entire server (admin)
4) Ability to speak privately to a group or individual members (admin)
5) Ability to have a vocal radius where you can speak and individuals within a radius of x can hear you

Unfortunately, from what I have read, OpenAL is tragically incomplete.

[urlhttp://www.openal.org/snapshots/openal/docs/oalspecs-full/oal-operation.html[/url]

Quote:OpenAL (henceforth, the "AL") is concerned only with rendering audio into an output buffer, and primarily meant for spatialized audio. There is no support for reading audio input from buffers at this time, and no support for MIDI and other components usually associated with audio hardware. Programmers must relay on other mechanisms to obtain audio (e.g. voice) input or generate music.

This is from back in 1999, and this _STILL_ hasn't been implemented?!

Lets say there was the ability to read audio input from buffers, what would I do? I plan to integrate "speex" www.speex.org which is a completely license/royalty free high compression encoder/decoder based off of oggvorbis.

Obviously, if openAL is incomplete i'm SOL. So is there an alternative for me here? Are you currently integrating this into your project? Would you like some help? Have you started an given up, but would like to breathe new life into this "resource"?!

If you have information regarding this, I would greatly appreciate it, as I fully plan on creating a resource for everyone to make use of. I honestly can't place a high enough value on in game voice-comm for FPS/RTS Team Based gameplay, and there are more uses than just that.

I hope to hear from someone soon!

Cheers

James

About the author

Recent Threads


#1
02/16/2003 (3:31 am)
The in-game voice comms was part of the audio code that was removed when the Tribes 2 elements had to be removed from the TGE. I think it may have been part of the miles sound system (that was originally used for the audio).

The only time i've ever seen voice comms used even like it should be is in clan matches for the normal team based games, like Counterstrike etc. It has virtually no use at all on a public server. And in this case i've never seen them use the ingame system (even if it has one), as they can't use it to bypass game rules etc. (dead people spotting etc.) But this seems to be the accepted norms :(

But to try and be useful:

1. Usual Client / Server approach, I guess this would be a minimum.
2. Probably be less useful, without the attached context that we're used to how would you tell a global team chat from a 'local' team chat? Maybe you could use different tones at the beginning or something, but having to think about where the chat is coming from probably lessens it's effectiveness.
3. Virtually the same as 1.
4. As with 2, these would probably be better implemented as a peer-to-peer system. Something that's outside of the tribes normal system. As above i'd also consider it's usefulness.
5. This would be very easy, but I can't see any way of doing it server side without signifigant load. Which means if someone plays around with their 3d sound settings it could be abused.

From my experiance with OpenAL it really doesn't do much apart from play sounds, and occasionally supports 3D sound given the implmentation. Capture etc. really has to be handled by your own mechanisms, playback should be fine, using the same system as more time critical game packets though might cause a signifigant overhead.

The hardest part seems to be management and transport as opposed to playback etc. I have no idea how standard audio capture is across systems, but it's fairly easy on a windows system ;)

This has been discussed before in far greater detail but I can't find the thread now, hopefully it won't be long until the full text search is up :)

Out of interest wasn't OpenAL a loki games thing originally? If it was I guess that might show it's lesser progress :(
#2
02/16/2003 (6:26 am)
Maybe OpenAL is just being focused, ie, sort of like how PDF files don't support animation. Happily, since it doesn't touch the issue of sound capture, you're free to implement your own system.

Have you looked at www.hawksoft.com/hawkvoice/?

Also, there are a few vestiges of the voice system left in Torque; you might try to start building off of those.
#3
02/16/2003 (7:33 am)
Well, I had envisioned further functionality than simply being able to record your voice and have it played back to be implemented as desired by the individual developer. Provided the base functionality is complete, the programmer can implement it as he sees fit.

Things like being able to select a particular member of your team to voice can be handled by a gui. This could then be further extended for use by an admin who could have the ability to select an individual or individuals for speech.

I run game servers at redphive.org, and for those of you who have administrated servers before it can be a nightmare.. having to TYPE or private message a user is too slow, but if with the click of a button I could yell at that individual, a much happier admin I would be!

Any ways, getting back on topic. My friend Dean Wadsworth has compiled speex successfully for OSX, and I've got it working under win/lin. Some progress was made since last night, since (thanks Michael Vance from OAL/ML) the remnants of the T2 sound implementation still lives in the OAL code. There exists a capture extension for use under linux.

For windows I have found this: www.portaudio.com/, and i'm going to give it a shot and see how it turns out. This looks quite promising as it supports a great variety of platforms, and my needs are fairly specific, that of sound capture only, so hopefully I can get somewhere.

HawkVoice is the same as speex (www.speex.org), only speex is superior from what I've read so far, and much more recent and still under development.

There are yet further applications for integrating voice comm into TGE as well... the possibility of a 3d chat client come to mind...

I've looked at a few windows sound systems (MILES/FMOD/BASS) but all have lisencing requirements which would bankrupt an indie developer... ($4,000/$1,250/$925). So if you know of something well suited for this other than this portaudio I plan on trying drop me a line in here :)
#4
02/16/2003 (2:15 pm)
I wish I had more knowledge of audio.

I've planned on trying to do something along the lines of actual conversation between players for use in a MMORPG, that would take away the need for text chat (but not remove it completely for program reasons). I know it sounds a way off, but it would be a great thing to take a crack at:)

Oh well, guess it will happen soon enough...

Hope you succeed:)

- Chris
#5
02/17/2003 (9:17 am)
OpenAL was designed for playback of 3D spatial audio. So I doubt it will ever have capture support.

The best place to start is to add an audio capture interface into the Torque platform layer. That way all the platform specific code gets hidden in the platform layer and the rest of the voice comms code only relies on the platform layer.

Playback shouldn't be hard. I tried to design my audio streaming code to allow for easy addition of audio decoders. You can create another subclass of AudioStreamSource that will decode whatever stream format you use for voice.
#6
09/22/2003 (7:27 am)
I know it may seem off topic, but I would *strongly* recommend using Teamspeak 2. It is freeware, low impact, and completely transparent to your game...

It supports channels, and you can run multiple clients. Given, all players have to install teamspeak for it to work, but, I find this less intrusive than a game forcing you to turn off the intercomm because a player is being rude.

I play an MMORPG in which Teamspeak is used for my guild... it enhances play and allows for voluntary inclusion... some guilds require your connection to teamspeak to be allowed to "raid" with them.

To achieve group, squad, and server levels of voice communication, you'd have to utilize multiple connections to the teamspeak server using a different trigger button for each... for instance:

Middle Mouse Button = Squad Only (one copy of teamspeak client in Channel #DeathSquad 45)

Right ctrl button = Serverwide (another copy of teamspeak in adminned room in which only people with the password have voice... sitting in pub room)

To speak with individual groups, you'd have to tell them to join a room, and use your third copy that you use to move between rooms.

This does not cover the distance x/y communication or the player->player direct communication... I don't think teamspeak has this functionality, but I'm relatively sure you can run multiple copies.

If not, even still, one copy seems sufficient enough...

I would really like to see something that let you talk and really projected your voice at the volume which you spoke... this would make an MMORPG very, very interesting.
#7
01/17/2004 (6:58 am)
I created Roger Wilco, which was the first widely successful voice-app that ran "behind" games.

While tools like RW (such as teamspeak) can be very nice, there are phenomenal advantages in building voice into the game: everyone has it , they are spared the need to communicate server/channel/etc data manually and the bother of then taking that data and entering it into some other application, and the "channel model" can simply mimic the groups already defined by the game (game, team, squad, whisper to a given player). Thus, you can be assured that when you talk on a channel, the other guy is going to hear you. Plus,it opens opportunities for proximity-based voice where you simply speak and those near you hear you -- that is, effecting live speech as opposed to radio communications.

I am looking into adding speech to Torque at the moment. Does the lack of audio input code in OpenAL in any way restrict my ability to simply code up a version of audio input for each platform (well, for those which support such)?

And, in an ala carte fashion, I'll say that speech data (and indeed, Torque's generic model for unguaranteed data) could really go down easily in a bandwidth budget if it were to benefit from a networking model such as Roger WIlco employed -- that being a communistic use of the dissimilar push capacities brought to the table by the various nodes wishing to exchange streaming voice data. Truly, it was revolutionary: we could do the impossible and put upwards of 20 dial-up users on a channel without sending data through a server and without using more bandwidth than any node could spare for the purpose. AFAIK, PalTalk owns the patent we were awarded for this technology, and it is one that can really liberate you from the bottlenecks imposed by strict adherence to client-server or traditional peer-to-peer models of audio streaming topology. I am going to inquire as to whether this can be licensed "on the cheap".

tone
#8
01/17/2004 (9:33 pm)
Anthony, I first tried RW back in 99 when I started playing Tribes. It was awesome to talk with my tribes-mates and relay enemy info back, it really brought out the fun in the game. Thanks for the great program and if there is anything I can do to help you along in this process just drop me an email. I think at one point in time I sent an email to you about getting a developers copy of RW but that was years ago.

Thanks again and like I said just holler if you need some assistance. I would be willing to put some money into this if they will licence it out at a reasonable dollar amount.

Sam G
#9
01/17/2004 (10:11 pm)
Anthony Yeah like Sam said it was a great program I like it alot more then teamspeak etc. I used it with the Delta Force series started using it in 2000 I think...
#10
01/20/2004 (6:39 am)
Thanks for the appreciative remarks! The bad news to the above is that I did not personally code RW and am fairly weak at some of the tech required to access sound kit in a mature, robust manner.

But I am tinkering at the moment to add audio input on the mac (first). I have made faltering headway with the exceedingly cumbersome CoreAudio API -- and am shelving that for a moment to see if PortAudio (www.portaudio.org) might not be a better API to use for recording.

Pertinent to Garagegames folks... portaudio is cross platform and offered under LGPL-like licensing terms. I wonder if it may be a more actively supported and vibrant cross-platform audio layer than OpenAL (over the mid- to long-term). It includes audio input. On OS X, anyhow, it is based on CoreAudio and I'm hoping I can use its audio input facilities without somehow interfering with OpenAL for all existing Torque sound. If I make it work on the Mac, I will next try portaudio on Windows.

I intend to first offer channel-based tuning and then try to add a proximity-based, "just talking to each other" capability. I'll not do voice-activated input yet as experience tells me that too many people try to use it when they have no headset and they are not the ones who suffer from their oafish imposition... it winds up being a net loss that drags down the appeal remarkably. :( But I'll try to ensure that I leave a place where it could be hooked in.

Oh, and I'll also just work with Torque's unreliable network send as the method of sending audio. Adding in something like the RW network layer introduces IPR issues (and, need I mention... a lot of coding).

tone
#11
04/23/2005 (6:17 am)
Maybe i'm missing the hard part here, these are the challenges/steps to integration as i see it,

1. maintain a list of active/connected nodes, those wishing to communicate
2. capture sound from mic
3. stream to respective clients on capture using tges unguaranteed net xfer.
4. do what you have to do to pass it back through openal
5. develop appropriate gui for easy management in game of sound settings

#2 can't be THAT hard to implement, some os specific sections that hopefully have already been solved many many times before, and just require pasting into the engine...
#3 anthony implies is where it gets hairy, i can see this being the most
dificult part as you think of potentially different schemes to secure the bandwidth from different locations. and when that happens you have to think of a clever distribution mechanism lest you start playing telephone unintentionally :)

i loved roger wilco when it first came out, it blew my mind :) this is something i'll be looking into also, i have to prioritize my own learning for TGE, i just got my license 2 nights ago :)
#12
04/23/2005 (6:58 am)
#2 is easy if one uses portaudio. It seemed to be working fine back when I was plugging away at this.

#3 is probably easy to a TGE veteran. I was clueless.

#4 requires some work, I think. It is important, I feel, to be able to have streaming audio play from a point source (a radio... or a speaking player's mouth), and the 3D audio streaming object in TGE is not written very generally, but has the file-reading logic bundled right in with the block handling and sequencing. One should take time to generalize that to have callback hooks for handing in the next block to be played, and then write a file-reading and a VOIP version of this.

I don't see GUI as being an essential part of the initial offering, but a nice touch. Personally, I'd hope to steer clear of "channel-based" tuning models (broadcast, red team/blue team, etc) in favor of proximity-based audibility, but I would want telephones and view channel-based squad radios as essential for many game concepts.

Email me if you'd like to try this and could help do the portions I am not so good at. I'd commit to spending a week to try to get audio-in working for Windows and Mac.

I am email-addressible as username tone through dreadnoughtproject.org

tone
#13
06/03/2005 (5:22 am)
Don't forget www.speex.org/
#14
08/11/2005 (11:05 am)
As an update, I launched a renewed effort at this. Currently, I am stalled waiting for improvements in handling for streaming audio in TGE 1.4RC2. I have busied myself by hooking in a speech recognizer and can vouch that indeed PortAudio is capturing audio from the microphone just fine in Windows.

I hope that I will not fail in suturing together my VOIP side of things. It will rely on Speex and networking will be through TGE's Event system. My first "tuning model" is that the server will relay your speech to all other players within a max radius of your position.

tone
#15
08/19/2005 (2:34 am)
Very cool. Keep us informed on your progress. VOIP would fit great in our upcomming title. We will be watching your progress eagerly.