Using Sound Files

By Marc Lee | Senior Member

Since most technical communicators work primarily with text, they may not be aware of the many possibilities of using sound or audio in their work. In this column, I’ll detail some basics about the types and structure of audio files, and I’ll discuss how these files can be used in familiar multimedia tools such as Captivate and Flash. Today, even Acrobat and PowerPoint offer the option of embedding audio files. Audio requirements may fit your technical communication tasks. Of course, if you are used to working with audio, then you know that audio files are a time-honored tradition in the computer industry, with a long history and good, stable technology over many years. If you’re an audio guru already, and I know many of you are, you’ll likely still find a tidbit or two here that will be useful.

Let’s look at an audio waveform:

Figure 1. Audio Wave Form Graph (text callouts added)

This wave form image is a capture from an audio editing program called Audacity (freeware from SoundForge, www.audacity.soundforge.com). It’s a snippet of audio from an Adobe Captivate project explaining the meaning of interface objects to a web application user.

As you can see, the wave form display makes it easy to see the appearance and relationship of the words in the recording. If, for example, you needed to edit out “at a glance,” you could easily see where those words were in the sequence. The vertical dimension of this wave form represents the audio volume; the greater the vertical distance over which the word spreads in the wave graph, the greater the audio volume or energy content in the sound. You can tell, for example, something about the intonation in Figure 1: “this bar lets you see at a glance.…”

As with image files, if you’re going to use audio, you should know the basic technical specifications. The most popular, best-known audio file types are .wav (called “wave” files) and .mp3 (MP3). There are many other formats as well, but for introductory purposes, I will limit the discussion to those two. The .wav format has been around at least as long as I have been doing multimedia (since the mid-1990s). MP3 is more recent and is supported more on the Web than wave files. Programs like Flash and Captivate can use both formats. Generally, wave files are larger in byte size for the same length and quality. In a recent example, I used Audacity to convert a 3.8 MB voice-over file to MP3; the resulting file was 683 K—a file compression of more than 5X.

If you’ve read my earlier columns about graphic files, you know that file size and quality often are a trade-off. The same holds true for corresponding audio file specifications:

File length. The longer the audio track, the bigger the file footprint on the disk and download time. It’s best to estimate about 90 K/second of audio for wave files and about 15 K/second for MP3s. Mileage may vary based on the audio content of the file.

Encoding frequency or sampling rate. I’ll eschew the bits and bytes theory; just think of this as the “resolution” of the file. Remember, it’s a digital file (made up of 1s and 0s). So the sampling rate is the number of times the recording takes a sound sample of what’s being recorded. Typical rates are 11 KHz, 22 KHz, and 44 KHz. KHz stands for kilohertz, where kilo is just 1,000 and hertz is cycles or samples per second. Think of the Hz part as standing for “per second.” So in the three examples listed previously, we’re taking 11,000, 22,000, and 44,000 samples per second, respectively. The higher the sampling rate/encoding frequency, the better the sound quality but—there’s that trade-off—the larger the file size. Think of sampling rate as the “dpi” (dots per inch) of audio files: the amount of information per second that’s encoded in the file.

Bitrate. Bitrate is the number of bits per sample. This is the other half of the quality (or fidelity, as it is referred to in the audio world) equation. Think of it this way: every second you are taking a certain number of samples (the sampling rate). Bitrate is the number of bits per sample. In our analogy to image files, bitrate corresponds to bits per pixel (or color and transparency information in a photograph). If you have more bits in each sample, the digital encoding can make finer and finer distinctions about any incoming sound, just as more and more colors can be distinguished depending on the number of graphic bits per pixel. Typical bitrates are 48, 64, 96, and 128 kbps. Here the analogy with color data breaks down somewhat. Since the bit rates are per second, it’s not the same as per sample. Remember, our typical sampling rates are 11 KHz and so on. So if our sampling rate is 11,000 samples per second and our bit rate is 64,000 bits per second, simple division tells you that your bits per sample is a little under 6 bits per sample. If your bitrate is 96 kbps, it’s just under 9 bits per sample. Think of both sample rate and bit rate as just two “dials” you can twist to increase or decrease audio fidelity and trade off file size. The file size will respond quite linearly to a change in sampling rate. A file that’s 600 KB when recorded at 22 KHz will be about twice as big as one recorded at 11 KHz. So the latter will be about 300 K.

This column addresses media subjects for technical communicators. It discusses graphics, audio, animation, video, and interactive media as they relate to technical communication. Please send comments to marc@mlmultimedia.com

Figure 2 shows the selections for audio specifications that Captivate offers for its recording tool. If you’re familiar with Captivate, you know that one option is to narrate your capture while recording. Here you’re being asked to select the audio fidelity you want for the narrative audio. A note of advice: there’s a big difference between audio files for voice and music or other types of sound. The amount of fidelity (or audio resolution) needed for speech information is less than that for other types of audio. The human ear has evolved to parse speech at a low sampling rate. The audio sampling rate for telephones is about 8 KHz, meaning that it’s lower than the lowest typical setting in audio recordings for computers. If you’re just recording speech, it’s probably fine to use 11 KHz as your encoding rate.

Audio in Captivate and Flash

Let’s talk a bit more practically. How are audio and audio files used in today’s media development tools? As examples, I’ll use Adobe and Flash—two tools I use regularly in my e-learning business.

Figure 2. Audio Settings—Captivate

Synchronizing Audio and Visual Effects

In Figures 3 and 4, we see the timelines of Captivate 4.0 and Flash CS3—a pane in the development or editing desktops of both products. Both have an audio layer. That tiny squiggly line you see in both timelines is a mini version of the audio wave form. The red bars in both images show the current position of the playhead. How do these layers get there? The audio layer in the Captivate project will automatically show up in the timeline if you are using the “narrate” option in your Captivate capture. “Narrate” means that you’re going to record your voice when you create your demo. After the recording is complete, and you go into edit mode, the timeline for the frames will look like Figure 3. Flash does not record audio, and it does not have a recording mode. In the Flash timeline, the audio file must exist first. You download it to a certain layer and “keyframe” in the timeline. What’s nice about the timeline is a sync capability. Often in producing a multimedia or e-learning project, we want to be able to synchronize the audio to some visual, animated effect. Looking again at Figure 3, notice the mouse appears at the fourth second in the Captivate timeline. Let’s say the word “click” occurs at second 2 instead.

Figure 3. Captivate Timeline with Audio

Figure 4. Flash Timeline with Audio

In Figure 5, we see that, with audio in the timeline, it’s easy to move the mouse image to synchronize with the audio layer, making the mouse move just when the narrator says “click.” While very different in design, the Flash timeline offers similar synchronization capability using the timeline and moving the contents of layers to sync up. You can experiment with the exact timing; some prefer the audio information to precede the visual example, others prefer them to occur simultaneously. While these examples use Adobe products to demonstrate audio effects, a similar setup is available in other tools. PowerPoint has corresponding audio manipulation tools, if somewhat less sophisticated.

Figure 5. Synchronizing Sound with a Mouse Movement

Audio Short-Takes

I recommend that you use audio. I am a big user of audio narration in e-learning projects. We never produce a CBT with text only. My personal experience is that adult learners are far more engaged when you speak to them about the topic as well show them. Most of my clients agreed with this, and I would feel I had shortchanged the client if I provided a silent product. But it’s also important to use audio wisely, to wit:

Keep it short. If you’re recording audio or importing it to Captivate, Flash, or PowerPoint, try to keep the files short per slide. Long files are large and may affect download speed of the entire project. Also, long audio can be very tedious.

The user must be in control. If possible, give the user a mute button and rewind control to allow your project to be replayed or played without the sound. Also, it’s essential to check that your end users have a sound card and speakers if your project incorporates audio.

Other short takes:

Sound effects. Thousands of very inexpensive sound effects (about $2.00 to $4.00 each) are available for use in your projects. Everything from gunshot sounds, to animal calls, to industrial sounds, to electronic tweeps and blurps are available for download at low prices (see www.soundrangers.com). Sound effects are fun and effective, but be careful not to annoy your users with overuse. Sounds are the equivalent of stock photography for your multimedia projects. All good sites offer both MP3 and wave formats for their products.

Text-to-speech. One of the real technological promises over the past 20 years is a really good speech synthesis tool, which can produce an audio file of someone speaking when you provide the tool with text. Adobe has provided such a tool since Captivate 4.0. I never use these tools since, in my opinion, they all sound like computer-speak and don’t fool anyone. The upside is ease of use: no mics, no retakes, no edits; you just place your text in the captions pane (in Captivate) and the audio track shows up on the timeline.

I hope this overview will encourage you to use audio in your projects where it make sense. I think you will enjoy using audio, and your users will thank you for providing them with a more engaging, interesting project.

Marc Lee (marc@mlmultimedia.com) is owner of MLMultimedia, a multimedia and e-learning consultancy. Marc has been a member of the STC Rocky Mountain Chapter for about 20 years and was chapter president from 2004–2005. Marc has a PhD in English from the University of Wisconsin-Milwaukee. MLMultimedia’s website is www.mlmultimedia.com.

TagsMedia Matters