59.3, August 2012

New Modes of Help: Best Practices for Instructional Video

Jason Swarts


Purpose: To examine instructional video as a vernacular form of technical communication serving readers unwilling to consult print documentation. Provide a set of best practices for creating and delivering video based on analysis of existing, highly-rated content on YouTube.

Method: Assessment of a criterion-based (that is, software type) sample of 46 instructional videos stratified by user ratings. Inductive coding of shared within-group (that is, “good,” “average,” “poor” rating) features, including genre conventions, rhetorical work, and communication design qualities.

Results: Good instructional videos share qualities that appear to account for their strong user ratings and distinguish them from average and poor videos. Good videos spend significant time introducing an instructional agenda and forecasting goals and steps. In this manner, they function like video equivalents of printed documentation. Good videos also focus on demonstrative content, in which steps are both performed and explained or elaborated. By contrast, videos with lower ratings focus as much or more on simply doing the steps without explaining or explaining without doing. Good videos were also designed so that their instructional messages could be easily identified and accessed, easily understood and applied, and so that the messages were engaging and encouraging.

Conclusions: Designers of instructional video can successfully apply lessons learned from the design of instructional content for print while taking into account the medium-specific affordances and constraints of video and sound. The potential for rapid, viral distribution via social media channels should also inform the selection and design of instructional content.

Keywords: instructional video, help documentation, communication design, assessment

Practitioner’s Takeaway

  • Like good printed instruction, good instructional video begins with an introduction that frames the lesson to be learned.
  • Good instructional video spends more time demonstrating steps (doing and explaining) than either doing or explaining alone.
  • Good instructional video delivers content whose message is easy to locate and access, easy to understand and utilize, and is engaging and reassuring.
  • To compete for readers’ attention, deliver lots of highly specialized content via social media channels.

In a Do-It-Yourself World, People Still Don’t Read the Manual

In a recent feature story, writers at Wired magazine described a growing population of Do-It-Yourself (DIY) hobbyists who are taking on technically complex tasks in the name of pure enjoyment. Projects range from building bike computers and water purifiers to specialized task-based uses of software like configuring a database or creating a mail merge file. These hobbyists have developed keen interest in instructional content, generating their own in the form of wiring schematics, code snippets, procedures, and instructional video. Without question, some user-generated instructional video is poorly made, but some reputable producers are starting to emerge as well. Companies like Adobe and Techsmith already create their own video-based instructional content. And judging by the view count, it gets watched. The question this article seeks to answer is how do people make good video content? The answer begins with an examination of what motivates the use of video as a form of instruction.

The simple answer is that people need instruction. Certainly, this is true for project-based uses of technology but for mundane technologies as well. There are many occasions during the day when we must accomplish something with an unfamiliar technology: we must configure VoIP to call a colleague; configure a PDF form to be editable, or configure online calendars to be shareable. The software itself has gotten easier to use, as more of its complexity is pushed below the interface, and the people using it are increasingly savvy users who are willing to tinker and bootstrap themselves to functional proficiency, but there remains a need for instructional content.

Despite the need for instruction, users still steadfastly refuse to engage with traditional printed manuals. Novick and Ward found that when participants in their study were given the option to “ask others,” “use online help,” “solve without assistance,” or “use printed manual” when solving a problem, an underwhelming 20% of problems were solved using the manual. Most participants gravitated toward other people (90%) and online help (75%). The reasons cited included better navigability, greater precision of help, and greater searchability (Novick & Ward, 2006a, p. 16). Participants also cited the “unstylish,” “boring,” and “antiquated” look of printed manuals as off-putting.

Following up on their study, Novick and Ward examined what users say they want instead. Most wanted a better way to locate solutions to their problems (Novick & Ward, 2006b, p. 86), more appropriately level-matched explanations (Novick & Ward, 2006b, p. 87), less extraneous information (Novick & Ward, 2006b, p. 87), solution-based indexing and access (Novick & Ward, 2006b, p. 87), and correct and comprehensive information. Instructional video, both because of its multimedia format and its mode of delivery is an appealing possibility.

Unlike most manuals, videos seem more informal—the narrator appears to be speaking to us. Videos are also frequently entertaining. They are deliberately encouraging, sending assurances that the viewers can easily apply the lessons. And they are seductive in the sense that advertising and marketing writing is (see Khaslavsky & Shedroff, 1999).

Video also offers a richer channel of communication that allows simultaneous broadcast of textual, video, and auditory information. Circulation on clearinghouse sites like YouTube and Vimeo, fueled by the hordes of amateur and professional videographers make the volume and reach of this content unmatchable. The energy and enthusiasm that amateur technical communicators bring to the most idiosyncratic tasks simply would not be possible except in a context of mass collaboration (Jenkins, 2006; see Shirky, 2008).

More pointedly, video addresses an underlying information design problem that Novick and Ward (2006ab) uncover while still arguing the value of the structural advantages of in-print documentation. One approach to understanding the problem is through Carliner’s three-part framework of information design (2000). He proposes three overlapping areas:

  • Physical Design: Design that directs users to a message.
  • Cognitive Design: Design that helps users understand a message.
  • Affective Design: Design that helps users engage with and feel comfortable about a message.

When Novick and Ward’s participants cite navigation and search problems, they are really uncovering physical design problems. Headings, titles, and indices have limited ability to direct users to pertinent content. When the participants noted problems with level matching and amount of detail, they are pointing to cognitive design problems. The information is inflexibly pitched at a level above or below where they expect it to be. When the participants mentioned stylistic problems with the “boring” and “antiquated” look of the printed manual, they were expressing affective design problems. There is little about the style of the manual that keeps them engaged. Each of these problems persist, in part, because of the relative inflexibility of the print manual as a mode of delivery, which is one aspect of information design that, although implicit in Carliner’s framework, deserves more deliberate attention. That these predilections lead to increased consumption of alternative instructional content (including video) holds true despite the superior ergonomics of printed texts as an instructional medium (see Swarts, 2004).

Instead of meeting their needs for instruction with print documentation, however, users are shifting those needs elsewhere, toward forums and video. Addressing physical design issues, video provides procedural information in multiple simultaneous channels (text, moving image, sound), creating complementary repetition that can help users isolate instructional messages. Unlike a book, however, videos have fewer options for navigation. It is not easy, for example, to locate step 3 in a 5-step video process without first watching steps 1 and 2. Video addresses cognitive design issues by combining various modal displays of content to allow richer details of the procedure to rise to the surface. Users can attune to the spoken message, which will have different details than what is visible in the video or the accompanying text. In this way, videos help address issues like a lack of detail and level matching. Finally, videos can address affective design issues as well. For some, videos are engaging and easier to consume than a book. Further, narrators, perhaps better through spoken discourse than written, play an important role in encouraging and motivating potential users.

The delivery medium, too, is beneficial in that it encourages short, easy-to-produce videos, findable by users through filtering and sorting. These services encourage rapid creation and consumption of copious amounts of specialized content, shaped only by hundreds of thumbs up and down. Eventually, through sheer volume and filtering and sorting, level matching and detail problems will be addressed. We have something to learn from this shift and one step in that direction is to see what makes these videos useful and engaging so that we can become effective producers of them. Specifically, the questions driving this research are:

1. What genred forms and rhetorical work distinguish “good” instructional videos from “average” and “poor” ones? In other words, what did these videos attempt to communicate and what uses did they support?

2. What communication design features distinguish “good” instructional videos from “average” and “poor” ones? In other words, how do these videos differ in expression?


A research assistant and I drew a criterion-based sample of 46 instructional videos from YouTube (IRB exempt). We searched for videos across four different software types (video editing, text editing, image editing, sound editing) matching the search “tutorial” or “how to.” These software packages were chosen to neutralize a bias toward one kind of content that might encourage instructional video with particular features. We then sorted the results to get the most watched videos, and from this ordered list collected a stratified sample of the first four videos rated 3.5-5.0 stars, the first four rated 2.6-3.4 stars, and the first four rated 0-2.5 stars across the software types. The sampling occurred before YouTube changed its rating system from stars to a like/dislike format. For all categories except the lowest rated, we located videos with thousands of views and hundreds of user ratings. For the lowest rated videos (which YouTube search appears to bias against), we selected videos with at least 25 user ratings. As a reliable assessment of video quality, user rating are problematic, but our assumption was that since users come to the video driven by actual need, in aggregate, the average positive or negative user ratings should reflect genuine satisfaction or dissatisfaction.

Building from Carliner’s tripartite model of communication design, we developed a two-step coding scheme to first uncover similarities and contrasts in genred form and rhetorical work between “good,” “average,” and “poor” videos. The second coding pass differentiated communication design features, among the same videos, within Carliner’s framework. Drawing on experience and advice from popular textbooks on procedure writing, the following codes emerged for the first pass:

  • Introduction: Any section of the video offering an overview, warnings, or list of necessary equipment.
  • Step: Any section of the video outlining or demonstrating the actions one carries out in order to complete a task.
  • Conclusion: Any section of the video offering closing remarks.

Simple reliabilities calculated with a second coder yielded agreement of 92.7%.

The second coding pass differentiated kinds of rhetorical work:

  • Explanation: any instructional talk that is not accompanied by actions taken to complete the step (that is, talk with no onscreen action).
  • Demonstration: any movement within the frame of instruction intended to illustrate a step—accompanied by explanation (that is, action plus talk).
  • Doing: any movement within the frame of instruction intended to illustrate a step—not accompanied by explanation (that is, action plus no talk).

Simple reliability with a second coder was lower, at 78%. But after modifying the code definitions, many coding disputes were resolved, resulting in an adjusted simple reliability of 90.8%.

The third coding pass examined communication design features, drawing on Carliner and on studies of multimedia and information usability (Albers, 2008; Grice & Ridgway, 1993; see Mehlenbacher, 2002):

  • Physical Design: concerned with access, viewability, and timing.
  • Cognitive Design: concerned with accuracy, completeness, and pertinence.
  • Affective Design: concerned with confidence, self-efficacy, and engagement.

The first part of Carliner’s framework, physical design, is about moving the reader/user’s eyes to relevant content. For this reason, issues such as access (for example, headings), viewability (for example, video resolution, audio quality), and timing (for example, speed of the video, pace of the narration) are influential. Problems at this level prevent viewers from navigating to the content, just as tables of content, subject headings and indices might inhibit navigation in a book. The cognitive design concerns include accuracy (that is, whether the video contained any errors of fact or execution), completeness (that is, whether the video appears to cover all expected topics), and pertinence (that is, whether the video is edited to include only relevant information). Here, we were concerned with issues related to level matching and providing sufficient details. Failures at this level may prevent viewers from understanding or applying what they have watched. Finally, affective design touches on issues of comfort, engagement, encouragement, and motivation. Qualities that matter are: confidence (for example, whether the narrator inspires confidence in the outcome of the lesson), self-efficacy (that is, whether the narrator or content encourages users to believe in their ability to succeed), and engagement (that is, whether an attempt is made to capture and hold attention). These affective qualities, while not directly overcoming problems of boredom or perceived lack of style, nevertheless work to engage the users, which is the underlying problem. A fuller discussion of the analytic process and assessment rubric is in Morain and Swarts (2012).

This study, like all studies, has limitations. First, it should be noted that a sample of 46 videos does not lend itself to robust significance testing and the relative proportions reported in the next section should only be understood to indicate the relative strength of the patterns uncovered. Further, an additional dimension of usability that is not explicitly addressed here (although implicitly referenced throughout the analysis) concerns the ergonomics of video compared to print. Quite simply, there are some uses for which text is better suited than video. For the purposes of this article, a sample video analysis will follow in order to summarize the qualities of good instructional videos.

The Rhetorical Structure of Instructional Video

By in large, good instructional videos tended to resemble procedures in print. They had a similar form and did similar rhetorical work. To understand what this means, Farkas (1999) offers a useful starting point. Although writing in 1999, he presciently observed that the structure of procedures he elaborated would constitute a “set of relationships, a consistent logic, that […] underlies all forms of procedural discourse” (1999, p. 42). His model describes a relationship between states (desired, prerequisite, interim, and unwanted) and the actions needed to navigate them (human actions, system actions, external events). All procedures describe how someone uses a technology to achieve some result (desired state) by first establishing where the task starts (prerequisite state) and how it proceeds (interim states) toward conclusion.

The first coding pass revealed a similar structure, in which introductions were places to talk about the goals of instruction as desired states and about prerequisite states or conditions that needed to be met prior to following the rest of the video. Interim and unwanted states plus the variety of human and system actions that required negotiation constituted the bulk of the procedures in the form of steps. Significantly, Van der Meij, Karreman, and Steehouder (2009) find Farkas’s model of procedures viable today; although, they note that “[i]n only three decades, a predominantly paper-based approach to instructing novice computer users has evolved into a multimedia, multichannel support system for multiple audiences” (Van der Meij et al., 2009).

The empirical evidence bears out Van der Meij et al.’s, observation. Across all videos, without regard to user rating, the proportion of introduction to steps to conclusions showed that approximately 73% of all video content (measured in seconds of footage) was devoted to steps. Of the remaining 27%, approximately 2/3 of that (~18%) was devoted to introductory material. 1/3 or 9% was given to conclusions, which consisted of re-iterations of what had been accomplished or, more often, exhortations for viewers to “rate and subscribe” to a particular video channel. Good and average videos devoted approximately the same amount of time to introductory framing, steps, and conclusions. Poor videos, however, devoted more time to steps and less time to introductory framing. Notably, they often did not start with an overview of the instructional goals: the desired state to which a set of procedures should be leading. Neither did they offer much context.

There are also differences in the breakdown of rhetorical work performed, which is a clearer indicator of what these videos are doing. Van der Meij et al. (2009) have noted that, over the last 30 years, there has been a shift in procedural content away from declarative expressions of information that would have been appropriate for what Janice Redish called a “read to learn” audience toward procedural information that is more appropriate for what she has called a “read to do” audience. The coding distinguished these functions by differentiating “explaining,” “demonstrating,” and “doing.” The latter two (demonstrating and doing) acknowledge that, in video, the presentation options are not simply declarative or procedural. One can demonstrate procedures and explain the process simultaneously, using the audio to complement the visual (see Bishop & Cates, 2001) or let one’s actions speak for themselves by doing without explaining. In printed procedures, the distinction between demonstrating and doing is less vivid.

On average, the videos were comprised less of explanation (31% of coded content) and more of demonstration (51% of coded content). The amount of doing showed up in lesser amounts as well (18%). Broken down by user rating, we see a different pattern. The poor videos had the most doing and the least amount of explaining. Good videos had more explanation, more demonstrating, and less doing. When demonstrating, the narrators of good videos were explaining what they were doing and why. The explanations turn out to be fairly important, as well, for contextualizing the procedures in larger tasks that users might be engaged.

A Typical “Good” Video—Communication design features

While difficult to find an instructional video that exemplifies all of the qualities typically shared by good videos, one titled “Movie Maker Video Editing Tutorial” (http://www.youtube.com/watch?v=JZXK68NS7gU) comes close on most accounts. It is certainly good enough to point to key communication design features from which we can draw out best practices, using the modified version of Carliner’s three-part information design framework discussed earlier.

Physical Design Qualities

Assisting with navigation, this video uses title slides to demarcate the transitions between sections. While not actually functioning as bookmarks, the title frames are visually distinct and remain on screen long enough to allow someone moving the progress bar to use the title frames as entry points. Another accessibility aid is that the video’s creator cropped the instruction window so that all content captured is pertinent to the instructional message (Figure 1). By reducing the amount of extraneous visual information (also a pertinence issue), the viewers can more easily attune to the information that matters.

Figure 1. Still Screen Showing a Cropped Workspace

Stripping away extraneous information, however, is not necessarily the same as drawing a viewer’s attention to important content. Doing this requires visual and verbal “pointing,” the use of deictic language to direct attention. Annotations like arrows and callouts, words like “this,” and “that,” and even zooms and pans all serve to direct attention. They point (sometimes literally) at the content that is important. In videos rated lower, the verbal pointing largely consisted of empty or ambiguous language such was “click here” or “get this thingy.” The more precise deictic language in good videos relied on interface terms such as “click on the timeline” or “drag in your clip from the media bin.”

Viewability also mattered—often, problems with video and audio quality were the only factors that explained a low user rating for otherwise decent content. This video bears evidence of someone’s experience and skill at shooting video and creating audio. Shaky or blurry video and garbled or otherwise flawed audio can get in the way of an instructional message. Sometimes poor video literally prevents users from seeing tool or menu selections (Figure 2).

Figure 2. Still Screen Showing Blurred Tool Bars from Poor Video Production Techniques

The timing is also good. Since instructional information is coming through different channels, they ought to complement each other. Here, the steps are announced before they are shown. So, when the narrator tells us to open the transitions pane and make a selection, we hear the step a split second before we see it, long enough to get mentally “set” for an action.

Although the video introduces its own navigation problems, the overlapping use of moving image, still image, and narration complement and reinforce one another to make navigation somewhat easier.

Cognitive Design Qualities

First, the content of this video is accurate. There are no errors of fact—users were not told something about Movie Maker that was untrue. Neither were there any errors of execution—all actions taken were met with the expected results. Usually, viewers knew this to be true because the intended results were announced prior to the action being taken.

More importantly, however, the video had a sense of completeness to it. Viewers knew the goal from the outset. At the start of the video, the title frame announces that the goal is to “create a digital movie with Movie Maker 2” (Figure 3).

Figure 3. Still Screen Showing Announcement of Instructional Goal

Right away, the viewers know what to expect and perhaps can create a mental map of the necessary steps. Each title frame announces a new objective, such as importing video, adding clips to the timeline, and trimming clips. While not present in this video to the same extent, other good videos featured narrators explicitly cueing the viewers by announcing “next we will X” before making good on the promise. The point is that good videos establish an organizational superstructure, just like an effective manual would do. Sections of the video are organized around tasks, also like effective task-based documentation.

Finally, the instructional message is clarified by retaining only pertinent content. This video shows clear evidence of planning and editing. It appears to have started from a script or storyboard. The shots are carefully selected. No extraneous visual information is included. What explanation is offered usually extends the lesson to show its broader applicability. For instance, when talking about trimming video clips, it is mentioned that the same technique can be used to trim audio.

Finally, the redundancy of the audio and video is worth noting as a way of addressing level matching problems. Together, both channels offer slightly different details about the content. Where the audio channel calls a play by play of the actions, it is also used to elaborate the content and build a different kind of understanding than would be possible through the video alone.

Affective Design Qualities

Finally, the video exhibits a number of notable affective design qualities. For one, the video is actually enjoyable to watch. While the task (video editing) is probably more inherently enjoyable than other kinds of tasks, there is a level of seduction to this video as well. The video is effortless to watch and the moving images are enough to hold attention for at least a short amount of time. Further, the narrator appears to be aware that some level of content redundancy is necessary. Since the content goes by quickly, some important details, such as where clips go, how to trim them, and how to add transitions, get repeated. This repetition is reassuring that important points are not missed. The combination of text annotations, transitions, and sharp technical production makes this video looks stylish, professionally assembled, and credible, which lends confidence that the content is good.

Another notable quality is the narrator’s actions and tone, particularly where those actions and attitude inspire confidence and encourage viewers to attempt what is shown. This narrator is either working from a script or has rehearsed the delivery. Actions shown are smooth, with no halting between them. There is no indecision when selecting tools or menu options. The narration itself is cool and even-toned. This is the voice and actions of someone who knows what he is doing. They exude confidence, which in turn has a positive effect on viewers, inspiring them to confidence as well.

Best Practices for Creating Instructional Video

Based on the preceding analysis, which does capture most features common to good videos, we can see possible best practices emerge in the form of technical proficiencies, benchmarks of performance, and rhetorical considerations that will be the focus of this section.

Make the Rhetorical Structure of the Video Visible and Persistent

This means making visible the sections of the video, whether as breaks between objectives, resting points in tasks, or places where viewers are asked to pause and do something. The video analyzed above used title frames (see Figure 3) to divide subtasks. Other strategies may include actual, persistent onscreen titles, or simple black frames to visually signal a shift in topic. Not every viewer will want to see each video in its entirety. If people only look at enough content to satisfice (Redish, 1993, p. 17), then they will use whatever means are available to skim content and pick out what they want. In most videos, viewers are limited in their skimming abilities to moving the progress bar forward and backward. Persistent titles or clear breaks between sections will be visible using this skimming technique.

Test the Timing of the Audio/Video to Ensure Ease of Following

The Web is littered with videos that either go too slowly or too quickly through tasks. Keep the pacing just slower than what would characterize a typical competent performance. More importantly, as the video analysis showed, audio should slightly precede the video. When an action is demonstrated, it should be verbally announced a moment before. To the casual viewer, the audio and the video will appear synchronized, but in effect, the preceding audio will help attune the viewer to the screen and to the appropriate area of the interface.

Figure 4. Still Screen Showing Cursor Hovering Near Area of Screen Announced in Audio

Use the Recording and Editing Tools Well

If one quality stood above others as provoking the most ire from users, it was a demonstrated lack of skill at using the recording tools. Obviously, egregious incompetence such as using handheld cameras for screencasting and distorting aspect ratios should never be tolerated. But even little mistakes, like an annoying, persistent hum stemming from using a microphone with weak output signals, slight blurring of text from recording at an inappropriate size or because of a low sampling rate are equally problematic. The point is that viewers notice these warts, which aggravate without regard for the actual magnitude of the error.

Record in HD or Near HD Quality

Learn to capture high quality audio and to set the sample rates and microphone levels to get crisp, high resolution audio. The higher the quality is, the better it plays with other sounds and video. Degraded audio is an outright obstacle to hearing instructional messages. Similarly HD or near-HD quality video is a must. Viewers will frequently want to scale up or down the video to fit within a workspace, and going up or down in size has its own problems and limits. The pixels can only be jammed together so closely or stretched so loosely, but it is better to be a problem at the extremes than at points between.

Consider How Modes of Communication Complement One Another

There is growing awareness that some forms of content are good for certain kinds of messages and for supporting certain kinds of thinking (Bishop & Cates, 2001; Horn, 2004; Kress, 2004), and it has become common in technical communication textbooks to discuss the incorporation of images by noting how they can be clarified with text. Explore the ways that different modes interact and appropriate one another (Hull & Nelson, 2005). Text clarifies the abstractness of video. Text also segments and organizes video. Still images emphasize detail by holding it in place. Audio allows eyes-free action while signaling shifts in topics.

Get It Right the First Time

There are plenty of bad videos on the Internet that don’t get things right, and their user ratings show it. There is little that is more harmful to one’s credibility than to demonstrate a step and fail, produce a system error, or get a different outcome than expected. Worse still is to either ignore or dismiss the error with a casual “nevermind, it just works” or “I don’t know what that was.” Again, these are egregiously poor choices, but simple errors are almost as damaging. Wrong menu selections, wrong tool selections, even momentary hesitations can disrupt learning by diminishing the narrator’s credibility or by making the task seem more difficult than it is.

Start with an Overall Structure, a Goal, and a Set of Objectives

Be sure to have each and to communicate them clearly and often. A clear goal and objectives that mark off progress will communicate to the viewer what the beginning and end state of the video are supposed to be. This way, viewers can judge progress through the task and begin to anticipate where actions are headed, so that any gaps in detail can be filled. The goals and objectives may also play a significant affective role in motivating viewers to continue through the tasks to their completion.

Think in Cinematographic Terms

Techniques like master shots, long shots, medium shots, scenes, cuts, and montages all matter in the creation of videos (Gillette, 2005). Shots vary by the amount of context shown, the characters shown, and the level of detail. A master shot showing the futuristic cityscape at the beginning of Blade Runner establishes a context for action, which shapes viewers’ understanding of the characters. In instructional video, a master shot of the workspace might, likewise, establish a context of action, provide a sense of constraints and affordances. Other kinds of shots show characters with little detail but set in a context of action (long shot) or with greater detail to say something about their character or motivation (close up). Similar shots work in screencasting as well.

Figure 5. Still Screen Showing a “Close Up” Shot of Important Tools on the Interface

Long shots that show toolbars and menus establish the “characters” and their roles in the tasks. Showing a tool at the beginning is a promise (to refer obliquely to Anton Chekov’s gun) that it will play a role later on. Other analogous concepts like scenes and cuts help organize the video into tasks or actions, with objectives that are achieved in order to advance the video. Cuts delete action, compress time, and compress space to create connections between actions that would otherwise be too difficult to show in real time. Planning should then also include moviemaking techniques like storyboarding, scripting, and shot plans.

Use Strategic Redundancy

For viewers who listen to the content instead of watching, repetition helps key points stick, but even if someone is both listening and watching, a different kind of redundancy is merited. Some of the more effective videos relied on text annotations, drawings, spotlights, callouts, still screens, and spoken comments to clarify content. Selecting a tool on the screen provides one kind of information, but adding a callout draws extra attention. Showing an input value or a slider value provides enough information, but magnifying that content with a zoom or pausing it with a still screen or speaking the values aloud is a special kind of redundancy that underscores the message. Of course, not all actions need callouts or stills or even narration. Those that are inconsequential, obvious, or repeatedly performed do not need reinforcement after a while (if at all) and to do so would diminish the impact of the video by filling it with extraneous detail.

Rehearse the Script

One quality of the good videos is that they feature narrators with pleasant voices. It is probably not the case that these narrators are anything other than employees, but some of them may very well be actors hired for the videos. The particular combination of good looks and good voices suggests, at the very least, that some attention was given to the matter. Poor videos, not surprisingly, featured less sonorously-pleasing narrators. When the narrator was not the source of the problem, a lack of preparation was. Stutter-starts, indecision, rambling, and lack of enunciation all impact quality. A narrator who speaks without engagement or confidence does little to inspire confidence or engagement in the listener. Narrators who speak more confidently, flawlessly, with more inflection, with better enunciation, and with obvious practice, inspire trust and motivation.

User Test a Sample of Audio for Engagement

Clean it and optimize it but don’t overdo it. There is such a thing as audio that can be too perfect. The warm, comforting voices of our own lives are hardly flawless. Treasured recordings are pocked with skips and fuzzy background static. We could go so far as to say that these are the styles that make our voices human-like, so it may come as no surprise that one of the qualities correlated with effectiveness in online instruction is the perceived humanity of the narrator (Clark & Mayer, 2008, p. 177).

Recognize that Your Credibility Is Being Scrutinized

While a video may bear the endorsement of a company, lending instant credibility, some assessments of credibility will be based on more intangible aspects of the video. The previous two best practices indicate how the level of apparent practice and enthusiasm and other sonorant qualities of the narrator’s voice appear to influence engagement, but they also affect credibility. A lack of seriousness, halting delivery, trailing off at the ends of sentences, and monotonous delivery can easily lead a viewer to question just how knowledgeable the narrator is. And if that skepticism leaks in, what sort of impact will there be on the perceived credibility of the video? Likewise, technical ability appears to matter. Slick production, effective editing, post-production touch up, transitions, and good quality audio and video are reassurances that someone took the time to make a good video, and that kind of effort on the front end appears to suggest credibility and competence.

Seduce the Viewer

In a provocative piece about marketing writing and information design, Khaslavsky and Shedroff (1999) argue that the design of content can “seduce” an audience and keep them engaged. The same, it appears, applies to instructional video. The authors argue that seduction has three basic steps: make a big promise of something to be learned or accomplished, proceed incrementally by making and fulfilling small promises along the way, and then make good on the larger promise. Good videos followed this formula. These videos almost always started with a statement about what viewers would learn or accomplish. Doing this assures the viewers that they are about to watch something worthwhile. The videos then broke down the big promise into little ones that all clearly and inexorably led to the promised outcome. Finally, these videos (like most videos) made good on the overall promise.

Reassure the Viewer

While some viewers will seek out instruction, relatively sure of their ability to follow along, some will need reassurance. They will want to know that the task is not difficult, that the instructions presented will lead to a successful outcome, and that the task will be as easy as demonstrated. In other words, well designed instructions give some attention to a viewer’s self-efficacy (see Bandura, 1977). Obviously a correct and error-free demonstration will show viewers that the steps, if followed, will result in a successful performance, but there are other less obvious aspects to control as well. Soothing reassurances from the narrator are helpful. Confidence in delivery and actions will assure viewers that the performance is going to help them. Elimination of extraneous detail will make the object of instruction appear simpler.

The Delivery Matters

In closing, it is worth pointing out one more quality to online instructional video that likely makes it appealing: the mode of delivery and the model of production. In this age of user forums and online user communities, it is clear that the consumers of documentation know what they are looking for and will ask for it. Of course it is impractical for printed documentation to respond to the potentially limitless specificity of users’ needs. Producing and revising print documentation, even if delivered online, is costly and time consuming. Yet, one thing learned from the popularity of forums is that customizable documentation is welcome and perhaps even easier to produce when the effort is distributed across a user base.

Users Know What They Want

If users are driven to documentation to solve problems or to accomplish highly specific tasks, there is some likelihood that these are unanticipated (or unanticipatable) requirements. Let the users drive the development of some content. Printed manuals, delivered online, have a place and they will find the right audience, but short videos that address specific issues may be a more effective way to reach users. Let users suggest topics and tasks for which quick videos can be created.

Organize from the Bottom Up when the Top Down is Infeasible

If there is one point that comes up continually with online content, it is that sound information architecture is an essential quality of usability. Where people differ is on the details. Public intellectuals like Clay Shirky, for instance, would argue that content should be organized from the ground up, by the people who use it (2005). Let users create the paths through the content and mark it with the details of their interaction: marks of quality and key terms, for example. Information architecture proponents like Peter Morville and Louis Rosenfeld (2002) argue a similar approach, but offer a more cautious outlook, suggesting that some top-down organization is essential to ensure that information can be found or contextualized properly. Social media outlets like YouTube will certainly provide some ways for users to search and filter content, but even without, it would benefit the user base to host video content through a system that allows a robust range of searching, filtering, and tagging.

Make Lots of Content

After just a bit of searching through the databanks of YouTube, one quickly realizes that among the best rated and most frequently viewed pieces of content are fairly specific instructions that have little to do with either learning the features of a software or learning a generic task. One is as likely to see videos on “how to change hair color with the masking tool” as videos on “how to use the masking tool.” This is true for videos on all kinds of subjects. If we are talking about producing print/online documentation to address this potentially limitless variety, we quickly run into problems of scale, organization, and navigation. With instructional content in a social medium, there is less of a problem. Generate lots of highly specific content and let the users sort it out. This is a lesson that some forum managers are learning as well. I am not suggesting that videos be made on a 1-to-1 ratio of questions asked to videos made. That’s what a forum is for. What a video might be for is addressing issues that multiple people ask about. A robust tagging vocabulary would then help users find what they are seeking.

These suggestions for creating videos coincide with advice that teachers of technical communication, public speaking, document design, web design, and editing make to their students every day. The instructional video simply presents a rhetorical performance about which all these topics are uniquely pertinent.


Albers, M. J. (2008). Human-information interaction. In Proceedings of the 26th annual ACM international conference on Design of communication SIGDOC 08, SIGDOC ‘08, 117-124. Retrieved from http://portal.acm.org/citation.cfm?id=1456536.1456560

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84, 191–215.

Bishop, M. J., & Cates, W. M. (2001). Theoretical foundations for sound’s use in multimedia instruction to enhance learning. Educational Technology Research and Development, 49(3), 5–22.

Clark, R. C., & Mayer, R. E. (2008). e-Learning and the science of instruction: Proven guidelines for consumers and designers of multimedia learning (2nd ed.). San Francisco, CA: Pfeiffer.

Farkas, D. K. (1999). The logical and rhetorical construction of procedural discourse. Technical Communication, 46, 42–54.

Gillette, D. (2005). Incorporating motion into on-screen presentations of technical information. Technical Communication, 32, 138–155.

Grice, R. A., & Ridgway, L. S. (1993). Usability and hypermedia: toward a set of usability criteria and measures. Technical Communication, 40, 429–437.

Horn, R. (2004). Rhetorical devices and tight integration. In C. Handa (Ed.), Visual rhetoric in a digital world (pp. 372–373). Boston, MA: Bedford/St. Martin’s.

Hull, G. A., & Nelson, M. E. (2005). Locating the semiotic power of multimodality. Written Communication, 22, 224.

Khaslavsky, J., & Shedroff, N. (1999). Understanding the seductive experience. Communications of the ACM, 42(5), 45–49.

Kress, G. (2004). Multimodality, multimedia, and genre. In C. Handa (Ed.), Visual rhetoric in a digital world (pp. 38–54). Boston, MA: Bedford/St. Martin’s.

Mehlenbacher, B. (2002). Assessing the usability of on-line instructional materials. New Directions for Teaching and Learning, 2002(91), 91–98.

Morain, M., & Swarts, J. (2012). YouTutorial: A framework for assessing instructional online video. Technical Communication Quarterly, 21, 6–24.

Novick, D. G., & Ward, K. (2006a). Why don’t people read the manual? In Proceedings of the 24th annual conference on Design of communication SIGDOC 06, SIGDOC ’06, 11.

Novick, D. G., & Ward, K. (2006b). What users say they want in documentation. SIGDOC ’06. New York, NY: ACM Press. Retrieved from http://doi.acm.org/10.1145/1166324.1166346

Redish, J. C. (1993). Understanding readers. In C. M. Barnum & S. Carliner (Eds.), Techniques for technical communicators (pp. 15–41). New York, NY: Macmillian.

Rosenfeld, L., & Morville, P. (2002). Information architecture for the World Wide Web. Cambridge, MA: O’Reilly.

Shirky, C. (2005). Ontology is overrated: Categories, links, and tags. Retrieved from http://www.shirky.com/writings/ontology_overrated.html

Swarts, J. (2004). Textual grounding: How people turn texts into tools. Journal of Technical Writing and Communication, 34, 67–89.

Van der Meij, H., Karreman, J., & Steehouder, M. (2009). Three decades of research and professional practice on printed software tutorials for novices. Technical Communication, 56, 265–292.

About the Author

Jason Swarts is an associate professor of technical communication at North Carolina State University. He teaches courses on information design, networked communication, and discourse analysis. His research is on new media, mobile information technologies, and computer-supported cooperative work. Contact: jswarts@ncsu.edu.