radioAe6rt

Developing podcast audio from RTP VoIP packets

leave a comment »

This story is about as nerd-like as anything gets. I can’t help it. Here goes. If you think this is useful stuff, I have a question for you at the end that perhaps you can help me with.

Q. How to develop podcast audio from a phone call when all you have is VoIP, and no ability or inclination to install audio capture shims and wedges in whatever consumer OS you have lying around?

A. Capture both RTP audio tracks of audio using a PC-based packet sniffer, and save each track to raw u-law. This is actually more expensive – because it took time to figure out – than the already free Gabcast audio-capture service, but it keeps me in possession of my own mp3s (an issue surrounded by some legalese I don’t quite understand in the Gabcast Terms of Service). But I only pay this price once: the first time, when learning.

My first inclination was to use tcpdump and libpcap to capture and render audio to a disk file. That was really harder than I wanted to work on this if I could help it, but which would work nicely if it came to that. This blogger had the same basic idea, although he didn’t go into detail on how he captured and rendered audio packets.

Some poking around led me to Ethereal, the packet sniffer written by my old friend from Kansas City, Gerald Combs. Ethereal captures the packets and can save the RTP streams to disk in .au mu-law format. What a time saver this was.

To proceed, make sure your packet sniffing host can see the VoIP terminal’s packet flow. Meaning, make sure the sniffer and the VoIP terminal (I use PhoneGnome through an ITSP gateway to force the audio over IP) are on the same unswitched network (use a dumb hub). This is already obviously not a situation where you can use Skype. The audio has to be observable in the clear, not encrypted. So we’re talking unencrypted SIP-based VoIP here.

So start the sniffer, make the call, hang up, stop the sniffer. Then save the RTP streams to disk. Here is Ethereal’s audio output file:

$ file whitehouse.au
whitehouse.au: Sun/NeXT audio data: 8-bit ISDN mu-law, mono, 8000 Hz

Next, compile SoX with mp3 support, which means you need /usr/lib/libmp3lame.so (LAME). Or, make sure your version of sox has mp3 support (type “sox -h” and look for “mp3″ in the supported file formats output).

Next, convert the au file to mp3

$ sox whitehouse.au whitehouse.mp3

And you’re done. What’s cool about using Ethereal, or really bidirectional VoIP, is that whatever you say can be captured in one audio track, and whatever your interviewee says can be captured to another, separate track. Enter something like Audacity or Garage Band and you can edit both tracks independently of each other. Very cool.

So here’s the question: the mp3 audio sounds ok (and just ok, frankly), albeit a bit hollow compared to the rich, sonorous original, if I use play (also now with mp3 support, because parent sox has it) or realplay. But it has an annoying “chirp” or high-hat if I play the mp3 file using xmms or mpg123. This smacks of an mp3 sampling problem, but I don’t know on whose part. I can render the .au file to .ogg Vorbis, and play with xmms and that sounds ok, too.

More good fundamental information on audio capture over the phone was provided to my by Jon Udell.

[tags]podcasting,voip,gabcast[/tags]

Written by radioae6rt

July 12, 2006 at 8:08 pm

Posted in Internet

Leave a Reply