Crowdsource Your Way to Closed Captioning

roberts_bio

By Linda Roberts | Associate Fellow and Lisa Cook | Fellow

This column shares information about accessibility requirements and techniques, and introduces standards and policies that might affect your products. If you have feedback, contact Linda Roberts at lerober1@yahoo.com or Lisa Cook at Lisa.Cook@sas.com.

As technical communicators, we understand that no matter how well we craft a message, we must still deliver it. With increased use of video in corporate, technical, and marketing communications, providing captions for video is essential for delivering the message to the intended audience. Typically, when we think of captions, we think of people with hearing impairments. However, captions are useful in a broader range of use cases: cook_bio

Where one either cannot hear the sound (commuter train, airport) or does not want to use headphones
When the viewer is a non-native speaker, so merely listening to the language may not convey the full meaning
For a viewer with cognitive impairments or attention deficits, where a supporting channel aids learning

Providing captions directly in video can address these cases, in addition to meeting accessibility needs of people with hearing impairments. In addition to captions, which summarize, full video transcripts also meet broader needs. Text transcripts can be found via search engines and increase overall consumption of information.

One global company found these results quite surprising. After shifting from primarily email and intranet text to podcast and video for essential internal messages, the R&D communications team received concerns by employees who had difficulty hearing the video and who knew of others reticent to voice their concerns. The team agreed on a trial to provide transcripts for the podcasts and videos for one month and then to compare the response rate (hits, shares, comments). After one month, they found that the total consumption of the information had significantly increased. They received positive feedback in particular by non-native speakers, who appreciated being able to read along with the transcript. Mobile technical personnel also expressed appreciation because they could download a transcript and read it on their devices in transit, searching for keywords.

Providing transcripts and captions does require additional work and skill sets. However, technical communication skills may predispose us to contribute. In celebration of Global Accessibility Awareness Day (which is always the third Thursday of May) last year, one large software firm decided to “crowd source” transcription and captioning.

Promoting the effort through a cross-divisional voluntary advocacy and education group, the firm recruited many enthusiastic volunteers to first transcribe several dozen, short (five-minute) videos prominent on the firm’s external support site. In addition, the firm recruited several volunteers familiar with TechSmith’s Camtasia software. This software is used to join captions, derived from the transcript, and merge them to appear as the video rolls. To kick off the efforts, the volunteers met for a brief training session to build awareness of the process and to address concerns.

Over approximately the next week, transcription volunteers each transcribed two to three videos. To make the efforts easier, the transcribers started by using the automatic captions files that were generated by YouTube. Since these files were automatically machine generated, the quality varied a lot. If the video had a lot of background noise or had a soft-spoken speaker, then the file’s quality was typically much lower. The files also had no punctuation, and if there were multiple speakers, then their tracks weren’t broken out separately. Some transcripts were actually quite funny. But it is usually easier to start with something—even if it is low quality—rather than starting from nothing. We’d go into detail about how to generate these files, but Karen Mardahl wrote an excellent Intercom article (January 2011) titled “Captioning Videos on YouTube” that detailed the process. Volunteers estimated that to accurately transcribe a five-minute technical video, it took 30–45 minutes, depending on their familiarity with the material.

The transcription volunteers were also made aware of a free VLC media player (at www.videolan.org/vlc/index.html) to use to slow down video playback. The player helped to avoid the continual pause and replay during transcription.

The transcripts were then provided to the Camtasia volunteers who created and merged the captions in the software. These volunteers estimated that it took them two to three hours per video to integrate the captions. Note that the Camtasia users were individuals who used it generally, and not to produce professional training or video resources.

At the end of two weeks, the firm was able to re-post 40 videos, which now included captions. Feedback has been positive, and participants have expressed interest in repeating the project.