60.3, August 2013

Eight Guidelines for the Design of Instructional Videos for Software Training

Hans van der Meij and Jan van der Meij


Purpose: Video has become a popular means for delivering “how to” information about a wide variety of software tasks. With video rapidly becoming a major instructional method, the question arises of their effectiveness for software training. This paper provides a set of eight guidelines for the construction of instructional videos for software training.

Method: The guidelines present a concise view on how to design an instructional video for software training. They are based on a considerable body of research on how people process visual and verbal information and how to support these processes. Each guideline is described, illustrated, and supported with research findings from various disciplines.

Results: The guidelines were tested in three consecutive empirical studies. In these studies a set of instructional videos for Word’s formatting options were designed. The effects of the video instructions were compared with a paper tutorial (Van der Meij & Van der Meij, in preparation). We found that the video instructions yielded more favorable appraisals for motivation, higher skills proficiency immediately after training, and better skills retention after a one-week delay.

Conclusions: The guidelines offer patterns that could further advance the theory and practice of the design of instructional videos for software training. A limitation of the study is that we concentrated on instructional video that serve a tutorial function. For video that function as a reference guide not all the guidelines are equally important, and also some new guidelines may be called for.

Keywords: video instruction, software tutorials, procedural support, streamlined step

Practitioner’s Takeaway

  • Our eight guidelines support practitioners in producing task-pertinent video tutorials for software training.
  • Our eight guidelines have been tested and proven effective under controlled, experimental conditions
  • The descriptions that accompany the eight guidelines for the design of video instructions provide practitioners with background information that they can use to construct their own videos, or select the most apt ones from those that are available.
  • Our design examples can inspire practitioners to reconsider some of their design considerations for the construction of their own video tutorials.



Video has become a popular means for delivering “how to” information about a wide variety of software tasks. The rise of instructional videos has been stimulated by several factors. On the demand side, there is the exponential growth of new or updated software programs for which users request support. On the supply side, the rapid advances in easy-to-use technology have played an important role. Software programs such as Camtasia, Screencast-O-Matic, Captivate, Flash, and QuickTime have greatly facilitated the production and editing of instructional videos. The ability to publish and upload these videos rapidly and easily has given the final push, as the Internet offers a premier distribution channel for reaching millions of clients at the click of a mouse button.

The effects of these developments are visible on Web sites from companies such as Adobe, Apple, HP, Microsoft, IBM, and others that offer dozens upon dozens of instructional videos for their clients. Users have also been affected by these changes. From being solely consumers, they have now taken on the role of designer as well. Almost overnight, users have begun producing and editing thousands of “how to” videos that are published on Web sites such as Instructables, WikiHow, eHow, Howcast, Videojug, Vimeo, and YouTube. In short, with video rapidly becoming a major method for instructing the software user, the question arises of how effective these videos are for software training.

The growing popularity of video for software instructions is also visible in the rise of publications on this topic. The recent study of Swarts (2012) is illustrative for the current state-of-the-art as it set out to uncover a set of “best practices” for instructional video for software training. The relative lack of research on instructional video for software training also transpires in experimental research. Experiments on instructional video for procedural knowledge development are rare (see Höffler & Leutner, 2007). Only two recent studies compared a paper-based tutorial with a video tutorial (Alexander, 2013; Lloyd & Robertson, 2012). Unfortunately, both studies provide little detail on the specifics of the videos that were designed and tested, and yielded equivocal outcomes. That is, whereas Alexander (2013) could detect no advantage for a video over a paper-based tutorial, Lloyd and Robertson (2012) found that video instructions were more beneficial for software learning.

The present article contributes to the research on instructional video for software training by advancing a set of eight guidelines for their construction. By defining, grounding, and illustrating these guidelines, the reader is presented with design patterns. Such patterns are middle-level theories that offer standard solution schemata for recurring problems. They “capture regularities of practices in ways that are potentially intelligible, verifiable, and perhaps useful to the practitioners themselves” (Carroll & Farooq, 2007, p. 41). Design patterns can be useful for both theory and practice. They advance understanding of how designs can be effective and they frame and propose concrete design solutions that illustrate the underlying guidelines.


Eight Guidelines for Instructional Video for Software Training

There is a considerable body of research on multimedia learning that provides important insights into how people process visual and verbal information (e.g., Mayer, 2001, 2005a). This research forms a solid foundation for understanding how video can enhance learning. The multimedia literature provides further valuable guidelines for the construction of multimedia instruction. These guidelines are very general, however, leaving designers with (too) little concrete advice about the best solution for their specific design problem.

It matters considerably whether multimedia instruction aims to teach users how to solve a mathematical problem, or whether it aims to help users accomplish software tasks. For example, for mathematical problem-solving, the multimedia instruction should focus on enhancing the user’s conceptual knowledge. A good design solution could be a simulation that displays the solution steps visually, in combination with a voice-over that informs the user about the types of problem involved and the rationale behind each step. In contrast, software training should revolve around enhancing the user’s procedural knowledge. A good design solution could be a recorded demonstration that shows the user how to accomplish the software task, in combination with a voice-over that directs the user’s attention to the software elements (for example, locations, icons, menus) and important facets of the human-computer interaction (that is, user input and system reaction).

Screen Shot 2013-10-09 at 9.52.13 PMIn other words, while the underlying cognitive processing is the same and is relevant for all types of multimedia learning designs, it is vital to have dedicated design guidelines for instructional videos for software training. This paper proposes such a set of guidelines. We present eight guidelines that we consider to be fundamental for the design of video instructions that teach people how to accomplish software tasks (see Figure 1). The guidelines focus on the design of video tutorials; they concentrate on (sets of) instructions that support learning and retention of software skills.

Although the guidelines are sequenced so that they more or less follow the flow of a scenario of use, each stands on its own as a design principle. This independence is evident from experimental research on several of the guidelines, in which only one specific guideline was manipulated.

The guidelines are based on numerous sources, most notable among them: Bethke, Dean, Kaiser, Ort, and Pessin on usability (1981); Mayer on multimedia learning and multimedia principles (2001, 2003b, 2005b, 2005c, 2005d); Van der Meij and Carroll on minimalism (1998); Van der Meij and Gellevij on the Four Components Model (2004);  Tversky, Bauer-Morrison, and Betrancourt on animation (2002); and Plaisant and Shneiderman on guidelines for recorded demonstrations (2005).

In other words, much of the source materials for the guidelines comes from two closely related fields, namely educational psychology and instructional design. Perhaps more so than advancing new theory, or practice behind the process of the design of instructional video, the guidelines summarize key notions of accepted thinking. They generally do not offer entirely novel insights about the design of instructional video, but rather present a unique and helpful way of structuring and summarizing the pertinent research. Framed differently, one could say that the guidelines highlight the general assumptions behind “best practices”.

Throughout the paper we will speak of a video tutorial, or tutorial, to refer to a set of videos that together form an instructional package. The term video is reserved for a rounded-off instruction on a software issue. Usually this means that the video presents a starting state or problem, a solution path and an end state. We speak of a segment to refer to a section, fragment or screenshot from a video. The discussion of each guideline is subdivided into three sections: description, support, and design examples.

The description section introduces each guideline. There is a brief characterization along with a discussion of specific design features.

The support section presents the theoretical and/or empirical support for the guideline. Information from our core perspectives (that is, usability, multimedia principles, minimalism, Four Components Model, animations, and recorded demonstrations) is repeatedly presented here. In addition, we briefly summarize pertinent studies on demonstrations for procedural skills development. This section aims to do more than just provide support for the guideline. It also offers background information and insights that should assist the reader in making an informed decision about whether or not the guideline should be applied in his or her situation.

The design examples section consists of cases that illustrate strong or weak design solutions. These examples stem from two sources. Restricting the examples to these two sources enables us to engage in more in-depth discussion, and to show relationships between guidelines. One set of examples revolves around Camtasia Studio (version 7), a screen recording program developed by TechSmith, whose company Web site offers a large set of video instructions. Here we concentrate on the “Getting Started” series, which is a tutorial for first-time users of Camtasia Studio (“Camtasia Studio 7 tutorials,” 2013). The other set of examples revolves around a tutorial on Word’s formatting options. We have created this video tutorial ourselves, following the eight guidelines for their design. The effectiveness of this tutorial has been tested against a paper tutorial that dealt with the same topics in three consecutive experiments. We found that the video instructions yielded more favorable appraisals for motivation, higher skills proficiency immediately after training, and better skills retention after a one-week delay (Van der Meij & Van der Meij, in preparation, in review). We provide references to the guidelines (for example, G1.1) in pertinent places, to assist the reader seeking specific information.


Guideline 1: Provide Easy Access

Guideline 1.1: Craft the Title Carefully

Description.  Producing and publishing video instructions for a wider audience is one thing. Making them easy to find is quite another matter (G1). A user who is trying to locate video instructions for a specific software feature usually confronts two hurdles in finding that product. First, the user must find the most probable source or location for the video. This can be the software manufacturer, but it can also be a second party such as YouTube or eHow. Second, the user must select the proper candidate from among the available videos.  The title of the video plays a critical role in this decision-making process. Just like a title in a paper tutorial or manual, it should be crafted with care (G1.1). It is preferable to have the title contain a verb and an object, telling the user what task the video demonstrates how to perform. The use of jargon should be avoided for introductory materials. Likewise, sites may offer the ability to show a brief abstract or summary of the video that could enhance its accessibility.

Support.  In their classic paper on usability, Bethke et al. (1981) indicate that the first criterion for user documentation to satisfy is that the information it provides should be easy to find (G1). To achieve such accessibility, they advise designers to carefully consider these factors: arrangement, pointers, and consistency. Arrangement refers to the structural organization of the information. This structure should be aligned with the user’s perspective. Common methods for ordering content are: chronological, alphabetical, and topical. Pointers are indicators of content and presentation that assist the user in identifying and locating information. Pointers such Screen Shot 2013-10-09 at 10.14.27 PMas a table of contents, an index, or a keyword search facility assist the user in getting past the first hurdle of gaining access to a set of potentially useful sources. A title or a heading plays an important role in finding the right product within that set. Consistency means always presenting the same information type at the same place and in the same manner. Consistency helps the user build a schema of how things are presented. Once the user has developed this schema, navigation and reading are greatly facilitated.

The guideline to craft the title carefully (G1.1) signals the title’s important role in the user’s search for the right product. All or some of the title words probably appear in the table of contents and in the index, and will yield a hit with a keyword search. In addition, just as in a paper tutorial or online help system, the title should give the user a succinct description of the goal that is demonstrated (see Farkas, 1999; Van der Meij & Gellevij, 2004).

Design Examples.  TechSmith’s Web site provides great accessibility to their instructional videos, as it exemplifies their presence (G1). The home page offers at least four ways to access their tutorials and videos (see Figure 2).

One, clicking the Support-button at the top reveals a new Web site with three types of information: Tutorials, Technical support and Help in retrieving a lost software key. Two, typing the word ‘video’ in the open Search field yields a list of in-company support products. In that list, action verbs such as capturing, recording, and editing, in combination with object names such as video, screencast, and screenshot, are handy pointers for the type of help the user can expect to find. Three, clicking on the Start icon or the Free training link brings the user to a Web site that lists all available video tutorials. Four, the Support option at the bottom of the page links to the same three options as at the top, but users can also link forward immediately to Tutorials.

Screen Shot 2013-10-09 at 10.14.35 PMFigure 3 displays the table of contents for the “Getting Started” series for Camtasia Studio 7. The organization of the eight videos included in this tutorial is chronological, with a sequence that follows the basic scenario of recording, editing, and sharing a video, by and large. All of the verbs in the titles could perhaps be better presented as gerunds (that is, Recording, Adding, Cutting, and Sharing), as this is the top choice for titles as recommended by Farkas (1999). Consistency would also increase, which would facilitate scanning.

Not all of their titles adhere to the guideline to craft these carefully (G1.1, see Figure 3). The title for the video “Record Full Screen” is not entirely satisfactory because it does not fully cover its content. In addition to demonstrating how to record what happens on a full screen, the video also shows and explains how to record what the user says into the microphone. The title should therefore signal both of these goals. In some titles the use of jargon is also problematic. For the target audience for these videos, the terms ‘dimensions’ and ‘pan’ are probably unclear. Given that titles are sometimes short on coverage or ambiguous, designers might want to consider adding a glossary-like description that appears when the mouse lingers on the title for about three seconds.

Screen Shot 2013-10-09 at 10.14.50 PMFigure 4 illustrates how we facilitated access to the instructional videos in our tutorial on Word’s formatting options (G1). We opted to present both the table of contents and the video on the same Web site. In the training situation on which our research focuses, users are likely to benefit from easy access to the videos at all times. During initial training, but certainly also during practice, they should be able to locate the videos quickly and without undue effort. To facilitate such switches, the table of contents was permanently visible and videos could be called up at any time.

We also numbered the videos to indicate the structural relationship of the content to a main theme.  So similar content is grouped together. There are previews and procedural demonstrations. Previews are demarcated with an icon.

Guideline 2: Use Animation with Narration

Guideline 2.1: Be Faithful to the Actual Interface in the Animation

Guideline 2.2: Use a Spoken Human Voice for the Narration

Guideline 2.3: Action and Voice Must Be in Synch

Description.  The prevalent format for instructional videos in software training is the recorded demonstration, which can be defined as a screen capture animation with narration (G2). The animation should reveal a scenario of use. It should display the sequence of events that take place as the user executes one action step after another during task completion. It is important to present in the animation the actual interface that the user is likely to see (G2.1). Showing the intact interface gives the user the same image that he or she is likely to be facing when trying to execute the task. In most cases this means a display of the whole screen. The demonstration then shows task execution in context, supporting the user in developing insights about the structural layout of the interface. Zooming is recommended when readability is at stake, such as where seeing a specific mouse click is important, or when text is entered as an example.

The narration should tell the users the story behind what happens on the screen, and perhaps add a bit of background. The story should be functional for what the user must see or do, rather than promoting the software. This goal is best served by a story that is in-synch with the demonstration of the actions (G2.3). Furthermore, the story should be presented in spoken rather than written form and the voice should be that of a real person rather than computer-generated (G2.2).

Support.  The guideline to use animation with narration (G2) agrees with a key tenet from dual coding theory and multimedia learning theory. The insights from these theories are reflected in the multimedia principle, which holds that people learn better from a carefully coordinated combination of words and pictures than from words alone. This important instructional design principle has been empirically validated in numerous studies (Mayer, 2005a). Further support for this guideline comes from the recent study by Swarts (2012) who found that users appreciate more highly video instructions that couple a demonstration with an explanation or elaboration.

The guideline to use a representation from the actual interface in the animation (G2.1) is fully in accordance with the congruence principle advanced by Tversky, Bauer-Morrison, and Betrancourt (2002). This principle holds that the content and format of a graphic should correspond to the desired content and format of the users’ internal representation. Graphics are better understood and remembered when there is a natural cognitive correspondence between the real thing and the graphical representation. The recent meta-analysis of research on instructional animations from Höffler and Leutner (2007) also supports this guideline (G2.1) with their finding that the most realistic animation yielded the highest learning outcome.

The guideline of presenting the actual interface (G2.1) has also been investigated for paper tutorials. Van der Meij and Gellevij (1998) have advanced a taxonomy of screen captures for guiding the systematic inclusion of screen captures in manuals. Their taxonomy generally argues in favor of presenting a series of full rather than partial screen captures, because it animates the interface changes during task execution. The specific claim that such an animation helps users build a mental model was later validated in an empirical study which compared a manual with full screen pictures with one with partial screen shots (Van der Meij, 2000). The taxonomy argues for the presence of partial screens only in special circumstances or for achieving specific functions. Empirical support has been found for the claim that partial screen displays are called for when objects are hard to locate or identify on a full screen, and when users must verify screen states where legibility is a key issue (Gellevij & Van der Meij, 2004).

Another noteworthy instantiation of the guideline to be faithful to the actual interface (G2.1) is found in “training wheels” technology. This technology reduces task complexity for users by making software options unavailable. An important feature of training wheels technology is that users always see the Gestalt of the whole interface. The users still see all menu options, but with some options grayed out and blocked from use. Among its other benefits, training wheels technology prevents users from making serious errors that are hard to recover. It has been effectively employed in several empirical studies on software training (Bannert, 2000; Carroll & Carrithers, 1984; Leutner, 2000).

The guideline to couple narration with a spoken voice (G2.2) connects with a well-established principle derived from multimedia learning theory, namely the modality principle (Mayer, 2001; 2003). This principle holds that learning is enhanced when words are presented as narration rather than as on-screen text. In paper tutorials the words and pictures must both be processed by the same visual channel. Such single channel processing can be taxing for all non-disabled users on which this discussion focuses. In multimedia presentations it is possible to call upon the resources of both the user’s auditory and visual working memory rather than just one. The capacity demands on the users’ visual channel are reduced by presenting verbal information through the audio channel (Moreno & Mayer, 1999). Based on the same argumentation, designers are also advised not to present verbal information through both channels at the same time. According to the redundancy principle (Mayer, 2001) one should avoid duplication, which would happen when a written text presents the same information as a narration.

Screen Shot 2013-10-09 at 10.19.33 PMThe guideline to use a human voice (G2.2) agrees with Mayer’s (2005) voice principle, which holds that learning is enhanced with a standard-accented human voice rather than a machine-like, or foreign-accented voice. Studies on animated pedagogical agents likewise indicate that users prefer a human voice over a computer-generated one, thanks to its greater naturalness and attractiveness (Baylor, 2011).

The guideline to synchronize the words and pictures (G2.3) aligns with the temporal contiguity principle (Mayer, 2001). This principle holds that when narration and animation must be integrated, a simultaneous presentation works better than a successive one. The reason is that in a successive presentation the user must hold one representation in memory and keep that active until the other representation appears. For many users this is taxing. Synchronization prevents this problem. Morain and Swarts (2012) who examined the design characteristics of high and low rated tutorial videos also mention synchronization as a distinguishing characteristic. That is, they found that highly rated videos synchronized the audio and video tracks “so that steps were audibly announced just before being carried out,” rather than late or never (p. 10).

Design Examples.  Figure 5 shows the opening segments of the first video from the tutorial on Getting Started with Camtasia Studio (that is, Record Full Screen). In accordance with Guideline 2.2, a human (male) voice is used for the narration. The speaker is speaking in his native language. The story is told clearly and with enthusiasm.

The narration is repeated in writing (in contrast to G2.2). Having the text show up on the screen may have been done to attract attention and to enhance recall. However, this is not a good design choice according to multimedia theory. Although there is no (other) visual image that demands the user’s attention, the redundancy may still adversely affect the user.

Screen Shot 2013-10-09 at 10.20.54 PMFigure 6 illustrates an example of the guideline to be faithful to the actual interface (G2.1). Except for its placement, the tool is shown exactly as it appears on the screen. The video from which this segment is drawn discusses the three main objects from TechSmith’s recorder tool. Foregrounding is functional because it makes the recorder the central point of attention. In addition, it helps the viewer perceive meaningful details in the icons such as the dotted lines in the Full Screen option, and the green checkmark for audio on.

Figure 7 shows a sequence of three segments from the discussion of the Recorder. The segments illustrate the synchronization between the narration and what happens on the screen (G2.3). The first segment introduces the action. The narrator draws the user’s attention to selecting the Full Screen option. By placing the cursor on the tool option the arrow changes into a hand. The first segment revolves around the software reaction. The narrator explains how the software reacts to the choice of this option, attending the user to the system feedback (that is, green dashed lines). The demonstration draws the user’s eye to the relevant screen features by zooming out and big blue arrows. The third segment introduces the user to an alternative option. First, the segment brings the Recorder back into full view. The narrator mentions the option of recording from a section of the screen. The hand points to the object but the narrator merely mentions the possibility for action, thus leaving the Recorder display intact.


Guideline 3: Enable Functional Interactivity

Guideline 3.1: Pace the Video Carefully

Guideline 3.2: Enable User Control

Description.  Enabling functional interactivity is a matter of built-in design features and user affordances. On the one hand it means optimizing the production of the video for its processing by the user. On the other hand it means facilitating user control (G3).

The scenario of the unfolding instructional events in the video should fit the user’s resources and capabilities (see Kennedy, 2004; Mestre, 2012; Wouters, Tabbers, & Paas, 2007). An extremely important facet in realizing such a fit is system-based pacing which can be operationally defined as demonstrating and explaining task execution at just the right speed for the user (G3.1). In a recorded demonstration this pace often depends on the narrative. The advice is to employ a conversational tempo and not to speak instructions too quickly (Morain & Swarts, 2012). Designers occasionally also affect the pace by extending natural breaks with an additional two to five seconds pause.

Another important means for achieving functional interactivity is the affordance of user control (G3.2). User control can be defined as the influence of the user on the playing of the video. The most common user controlled actions for video are starting, pausing, stopping, and replaying. These standard media player controls enable the user to look back at segments, to pause the video, and to skip familiar segments, among others. Recorded demonstrations generally do not have affordances for more advanced user controlled actions such as close-ups, zooming, alternative perspectives, and control of speed that can give rise to highly differentiated, and unique video usage (see Merkt, Weigand, Heier, & Schwan, 2011).

Support.  According to the Limited Capacity Model of mediated message processing, the ongoing stream of information in a video constantly challenges the user to decide which information to encode, process and store (Catrambone & Yuasa, 2006; Linek, Gerjets, & Scheiter, 2010; Palmiter, 1993). New video information must continuously be attended to, brought into working memory, and eventually stored into long-term memory. Simultaneously, the user needs to activate prior knowledge and connect this to the incoming information. Besides being dynamic and running parallel, these processes are also interactive. The incoming message influences the user’s processing, but also the user’s motivation and cognition affects how the message is perceived, encoded, stored, and eventually retrieved (G3).

An important facet for achieving functional interactivity that primarily resides within the video itself, is system-based pacing (G3.1). Finding the proper pacing for the video is a difficult balancing act. A slow demonstration can be boring, which can make the user inattentive. A fast demonstration can overload the user who may react with an automatic response, or stop viewing altogether (compare Bovair & Kieras, 1991; Linek et al., 2010).

In a general sense, the provision of any form of user control (G3.2) is an invitation for the user to become an active learner. According to constructivism, students’ initiatives and efforts in constructing meaning play an important role in their learning (Bransford, Brown, & Cocking, 2002). Students who are actively engaged in examining the subject matter learn more deeply than students who passively process information (Mayer, 2003a).

The guideline of providing user control (G3.2) speaks to the apprehension principle from Tversky, Bauer-Morrison, and Betrancourt (2002), which states that animations should be readily and accurately perceived and comprehended. The important obstacle of fleetingness of video, and the risk of lack of perception and comprehension that comes with it, can often, but not always, be overcome with user control of the playing of the video. The argument is that pausing, stopping, and replaying can reduce working memory demands. They allow for re-inspection and focusing on actions and specific screen objects or sections. They enable the user to exert voluntary or controlled allocation of processing resources. “Interactivity may be the key to overcoming the drawbacks of animation as well as enhancing its advantages” (Tversky et al., 2002, p. 258).  The influence of user control on learning is also acknowledged in multimedia research. In his segmenting principle, Mayer (2005) contends that learning is advanced when the learner can break down a video in meaningful segments rather than as a continuous information stream.

Research from Schwan and Riempp (2004) has found that special media player controls, such as the capability of varying the speed from slow motion to high speed and a change direction option (that is, backwards or forwards), facilitate learning. Participants in their study viewed four videos on tying nautical knots. In the control condition the videos ran continuously and participants could only replay the entire video, whereas in the experimental condition they could stop the video at arbitrary points and could use the indicated user controls. The latter condition yielded better results. Participants with user control needed less practice time to learn to tie the knots.

Screen Shot 2013-10-09 at 10.22.15 PMErtelt (2007) examined the influence of a combination of system-based pacing and user control. That is, the study compared a situation in which a video ran continuously to one in which a rounded-off video segment was automatically stopped and the user had to press play to continue. The idea was that the built-in stop was considered a signal of an important boundary and prompt to reflect on the video segment the student had just been watching. In addition, asking the user to initiate the resume play mode was believed to guard against viewer passivity (see Salomon, 1984), which is also known as the “couch potato effect.” The tested prediction was that learning would be enhanced with the manipulation. This was found. The segmented video with the stops did lead to a significant higher increase in procedural knowledge than the uninterrupted version (see also Mayer & Chandler, 2001; Spanjers, Van Gog, & Van Merriënboer, 2010; Tabbers & De Koeijer, 2010).

Design Examples.  In the Getting Started series for Camtasia Studio, the pacing of the video is a bit fast. Apart from the suggestion that a conversational tempo should be kept, the literature offers no precise guidance on the right speed for the narration. In our experience, it should be neither very high nor very low, and should be evaluated by a native speaker.

The default option in TechSmith’s videos is that the interface does not display the common “user control tool” that affords interactivity (see Figure 6). However, this tool can be activated simply by resting the mouse on the progress bar (G3.2, see Figure 8). Unless there is user input, the progress bar and the tool automatically disappear after about three seconds. We prefer to let the learner choose to make the tool disappear.

Screen Shot 2013-10-09 at 10.23.13 PMFigure 9 illustrates the application of the guideline to enable functional interactivity (G3) in our procedural instructions in the video “Adjusting the right margin” in Word. The narrative supports the demonstration, telling the user what to do and what happens on the screen. Also, the user is informed about the meaning of the object that appears. Only the most essential information is conveyed to reduce the risk of overloading working memory. The narrative is told by a female voice who speaks her native language. Finding the right pace was essentially a matter of trial-and-error, a judgment call of what seemed neither too fast nor too slow.

By default user control is enabled (G3.2). That is, the user control tool appears when the cursor moves into the right column of the video. This tool remains visible all the time during video play, disappearing only when the user moves the cursor over to the left column with the table of contents.

Segment 2 contains a deliberate pause of about five seconds, which slightly slows down the pace of the presentation. The pause follows immediately after the narration, giving the user time to absorb the information. The user can let the situation sink in and be prepared for the following step. More generally, the pacing of the video requires special attention to moments such as these where no narrative and also no physical action(s) are taking place. The tendency is to let the recorded demonstration move forward. We decided not to do so, but to pause instead when we wanted the user to assess the situation briefly and study the interface.

Guideline 4: Preview the Task

Guideline 4.1: Promote the Goal

Guideline 4.2: Use a Conversational Style to Enhance Perceptions of Task Relevance

Guideline 4.3: Introduce New Concepts by Showing Their Use in Context

Description.  A preview of the task ahead brings across the big picture, orients the user, and should help in developing a general, condensed schema for task completion (G4). In addition, a preview can illustrate the meaning of the task or goal. Before-after displays are especially strong stimuli that can entice the user to view the video and find out about unanticipated possibilities for using the software and how to accomplish those (G4.1). They derive their strength from combining concreteness with provoking a mental conflict, which are motivational principles for increasing student attention (Keller, 2010). To further increase user interest in the tasks that are demonstrated, the narration should be personal rather than formal (G4.2). Previews should not give detailed step-by-step instructions. A preview can also be designed as a tour of the main screen components (Plaisant & Shneiderman, 2005). As it does so, it should introduce the critical vocabulary by explaining the concepts and objects when they appear during the demonstration (G4.3).

Support.  Research on experiential learning lends support to the guideline to preview the task (G4). This literature indicates that people can get so easily entangled in task engagement that they do not take the time to reflect on their experiences. As a result, the learning effects of the experience tend to be low (e.g., Fanning & Gaba, 2007; Lederman, 1992). A preview can increase learning by raising user awareness before actually beginning the task (Kriz, 2011). It can direct the user’s attention to the main goals of the experience, helping them sift the wheat from the chaff when they actually watch the demonstration of the procedure. A preview may also provide background information, and give the user some prompts and hints.

Educational research on advance organizers also supports the guideline to provide a preview. Advance organizers have been found to be effective for knowledge development (Ambard & Ambard, 2012; Gurlitt, Dummel, Schuster, & Nückles, 2012; Hartley & Davies, 1976). A preview can play the same roles as an advance organizer. It can provide “ideational scaffolding for the stable incorporation and retention of the more detailed and differentiated material that follows” (Ausubel, 1968, p. 148). In other words, a preview can serve as an overall framework for the learning that lies ahead, helping the users get acquainted with these tasks.

Support for the guideline to provide a preview (G4) can also be found in research on event cognition (Zacks & Tversky, 2003). This research indicates that procedural learning is best supported by a combination of top-down and bottom-up methods. While the preview provides users with a top-down view of the larger picture, the procedural instructions supply a bottom-up view that enables users to achieve task completion.

Guideline 4 also aligns perfectly with the pre-training principle advocated by Mayer (2005). This principle holds that users should be taught the names and behaviors of system components prior to being instructed on how these components interact (see also Swarts, 2012). The reason is a reduction of cognitive load. For users to take in all the information about screen objects and their locations and also attend closely to the demonstration to learn how to do a task can just be too much.

Farkas (1999) mentions the guideline to promote the goal (G4.1) as an important rhetorical aspect in the construction of procedural discourse. One way to promote the goal comes from source credibility. Software companies, like TechSmith, who instruct their own clientele, have a good head start in this respect. Another facet that contributes to engaging or persuading the user comes from targeting the instructions to the right audience. The visual presentation can also be important. Showing rather than telling what the software does may increase the user’s perceptions of task relevance. The demonstration may further contribute to promoting the goal by convincing the user that task execution does not require an inordinate effort. As Farkas indicates, promoting the goal may make the instructions more verbose than “bare statements about states and actions” (p. 44). This is one reason that this guideline is associated with the preview rather than the instruction itself.

Guideline 4.2. is reflected in Mayer’s (2005) personalization principle, which holds that instructional messages should be presented in conversational rather than formal style. This principle rests on the assumption that messages that use a first or second person voice are more appealing to the user, and thereby stimulate more active processing of the instructions. In addition, it is assumed that the familiar style of such a message requires less cognitive effort. Research indicates that this type of personalization significantly enhances learning and slightly raises interest as compared to a more formal style (Mayer, Fennell, Farmer, & Campbell, 2004; Moreno & Mayer, 2000, 2004).

The guideline to explain new concepts in context (G4.3) fits with the just-in-time principle that is advocated in educational research (e.g., Van Merriënboer, Kirschner, & Kester, 2003). According to this principle, learning is facilitated when prerequisite knowledge is presented or activated at the point when the user needs that information to perform the task. Providing just-in-time information reduces the load on the user’s working memory.

Screen Shot 2013-10-09 at 10.25.13 PM

Design Examples.  Figure 10 shows the first twenty seconds from the “Add a Title Clip” instructional video for Camtasia Studio 7. The four segments illustrate a preview that conveys the concept of inserting a title clip (G4). The first segment concentrates on the software tool for inserting a clip. The prototypical blue arrows emphasize and illustrate the possibilities. The next three segments that follow in quick succession serve to promote the goal (G4.1). They illustrate a real-life example of inserting title clips.

Figure 11 shows three segments from the preview in our video on “Adjusting the margins for the whole text” in Word (G4). The first segment displays the start situation. The opening question in this segment draws the users’ attention to the design problem. The user is prompted to look around to see what is amiss. The screen shot makes the design task, the goal, concrete; the user can see that there is a formatting problem (G4.1). In line with the findings from worked examples research, the user is prompted for self-discovery of the problem, rather than being told directly up front (Atkinson, Renkl, & Merrill, 2003; Schworm & Renkl, 2007). The narrative also introduces the word margin, and immediately explains it in lay terms a sentence later (G4.3). Furthermore, there is a deliberate and frequent use of the personal pronoun “you” to emphasize the message that these goals should be important for the person watching the video (G4.2).

The second segment introduces the solution path. The narrative again mentions the word margin, and promises a solution if users manipulate the right object (that is, the double arrow). The video zooms in and highlights that object while the narrator introduces two new concepts, ruler and roof icon, that are shown on the screen (G4.3).

The third segment shows the outcome. The whole screen is displayed again while the narrator tells the user that the double arrow should be moved to produce the desired change. The narrator further invites the user to look carefully and discover that the goal of changing the right margin has been achieved (G4.1).


Guideline 5: Provide Procedural Rather Than Conceptual Information

Description.  Users consult a “how to” video because they wish to know what they need to do to complete a task. Such a video should therefore walk the user through the successful and immediate accomplishment of a task (G5). All of the information must be geared towards this goal. Conceptual information should be presented only when it contributes significantly to the user’s task understanding, does not distract too much, and does not require an inordinate amount of time.

Support.  Guideline 5 accords with a key design principle from minimalism which holds that users should be supported in their task completion, because that is their foremost reason for consulting instructions (Carroll, 1990; Van der Meij & Carroll, 1998). Plaisant and Shneiderman (2005) likewise indicate that recorded demonstrations should concentrate on conveying procedural information.

If procedural learning is the goal, and not merely successful task completion, it is not enough to demonstrate the step-by-step actions by the user and the changes on the screen. The user should also be stimulated to reflect (Van der Meij, Karreman, & Steehouder, 2009). Achieving both goals together is a challenge; the designer must find a way to both maintain the intricate user action-software reaction pattern of task execution and to interrupt that flow. The best moment for such an interruption is at points of subtask completion. Precisely then are users likely to benefit from a short pause in which they can reflect on the just completed task. They can possibly even benefit from a preview of what follows. The research from Ertelt (2007) has shown that such built-in moments of reflection increase learning from instructional video.

Screen Shot 2013-10-09 at 10.25.44 PMDesign Examples.  Figure 12 shows two consecutive segments from the instructions on the Recorder in Camtasia Studio 7. The demonstration is conceptual rather than procedural (G5). The settings that are discussed are not core tasks that the user needs to learn to perform in the tutorial. Rather, the discussion gives complete coverage of the tool options. Such a discussion would be suitable for a reference guide. For a tutorial, it is not. The information is not immediately useful, and may perhaps never be so for the user. The second segment further shows that the narrated text is also displayed on the screen. According to multimedia theory (Mayer, 2001, 2005c) this is an unwanted duplication that can cause overload (G2.2).

Figure 9 illustrates the application of guideline 5 in our procedural instructions on “Adjusting the right margin” in Word. The narrative presents only the most essential information needed for task completion. Actions and objects are described but not explained. For instance, there is no discussion about the nature of the ruler.

Actions presented as commands are the preferred choice for this type of information (Farkas, 1999; Van der Meij et al., 2009). In contrast to the preview, there is a dearth of personal pronouns (for example, “you”). This is done to make the instructions as short and crisp as possible.


Guideline 6: Make Tasks Clear and Simple

Guideline 6.1: Follow the User’s Mental Plan in Describing an Action Sequence 

Guideline 6.2: Draw Attention to the Interconnection of User Actions and System Reactions

Guideline 6.3: Use Highlighting to Signal Screen Objects or Locations

Description.  The main idea behind this guideline is that the user should be instructed with simple, prototypical explanations on how to achieve a task (G6). Clarity and simplicity partly derive from demonstrating a meaningful, realistic task, and leaving out all non-essential information. The sequencing of the actions and corresponding narrative should follow the sequence in which the user physically and mentally engages in task execution (G6.1).

The instructions are best presented as prototypical streamlined steps. That is, they should inform the user about a goal or purpose, and tell the user about the actions and states that lead to goal achievement. The imperative voice is best suited for describing the user’s actions (Farkas, 1999; Van der Meij, Blijleven, & Jansen, 2003; Van der Meij & Gellevij, 2004). The actions of the user obviously affect the reaction from the software. There is an intricate relationship between the two; user action and system reaction are therefore best seen in tandem (G6.2).

Occasionally, the situation requires special user attention to a screen element or location. Signaling of the mouse cursor, adding circles around screen objects and spotlighting features are among the many techniques that can be employed to grab the users’ attention (G6.3). These signals should be clearly perceived as imposed. The user should not confuse them with the real objects belonging to the interface.

Support.  Guideline 6 resonates with the apprehension principle from Tversky et al. (2002), which states that animations should be readily and accurately perceived and comprehended. It is essential for the video to be optimally designed for task demonstration if it is to succeed in this respect. The content of the video should come from a task example that is easy to understand, yet realistic enough to yield transfer. Generally, this means that it is stripped of any adornments. Moreover, the demonstration should present the most basic or insightful method (compare Van der Meij & Carroll, 1998).

Bethke et al. (1981) refer to the guideline to make tasks clear and simple (G6) in their second step of designing for usability. They suggest that designers attend to factors of simplicity, concreteness, and naturalness to make information easy to understand. Simplicity can be realized by using a vocabulary that suits the audience and by keeping the instructions for task accomplishment within the limits of the user’s cognitive capacities. The latter tends to be translated into the suggestion to break down sizeable tasks into manageable but still meaningful subtasks that require no more than three to five actions to complete (Van der Meij, & Gellevij, 2004). Concreteness can be achieved by presenting appropriate examples, pictures, and descriptions and by making these specific rather than general or abstract. Naturalness means that the sequence of the information in the instructions should match the most suitable order of steps for task completion by the user. That trajectory should also include checkpoints for the user to monitor progress.

Guideline 6 also accords with Mayer’s (2001) coherence principle, which holds that multimedia presentations can cause cognitive overload when they contain too much non-essential or extraneous information. To achieve coherence, the designer is advised to weed out all information that is not immediately meaningful for the user’s task. Slashing the verbiage is also a fundamental design principle in minimalism (Carroll, 1990). Likewise, Plaisant and Shneiderman (2005) suggest cutting all unnecessary words as a special design tip in the construction of recorded demonstrations.

The guideline to follow the user’s mental plan in describing an action sequence (G6.1) originates with Dixon’s foundational research (1982). According to Dixon, people who must carry out (written) instructions do so by constructing a mental plan that consists of a hierarchy of action schemas (see also Zacks & Tversky, 2003). One of the interesting implications of this research is that when the user’s actions vary under certain conditions, the instructions should begin by stating the conditions. This view is also evident in the advice from Farkas (1999) on extensions of the basic action step. According to that advice, facilitating modifiers and conditional steps should be given before the basic action step. Thus, it is better to say “On the File menu, click New” than the other way around.

According to the streamlined-step model (Farkas, 1999) the basic action step preferably begins with an imperative verb followed by an object (for example, Click Home). Farkas’ assertion that the fundamental unit for the user’s behavior is a coupling of an action with an object is also supported by theories on event cognition (Zacks & Tversky, 2003). But this is only a one-sided view. In procedural instructions for software use, the software reaction is also critically important. User action and system reaction depend upon each other. The Four Components Model (Van der Meij et al., 2003; Van der Meij & Gellevij, 2004) emphasizes this dependency by considering both together as a key component in designing instructions (G6.2). The reaction part in the component provides feedback. Immediate feedback has been found superior to delayed feedback in procedural skills learning (Shute, 2008).

Even an animation that is optimally designed to convey what it depicts may fail due to the users’ limited processing capacities. Users may find it difficult to see properly what an animation shows, and they may also fail to understand its meaning. Highlighting can help. It is a widely used technique for drawing user attention (G6.3). By foregrounding vital areas or objects, the “noise” of the video is reduced.

In the multimedia literature the functionality of highlighting is known as the signaling principle (Mayer, 2001). According to this principle, learning is enhanced when there are cues about the organization of information. The placement of the signaling devices further exemplifies a special instantiation of Mayer’s (2001, 2005b) spatial contiguity principle, which holds that information that belongs together (for example, words and pictures) should also be presented in close proximity. Thus, the signals should be positioned in the vicinity of the object that they are meant to highlight. Empirical research on animations suggests that selection cues significantly affect user behavior and learning (e.g., Amadieu, Mariné, & Laimay, 2011; De Koning, Tabbers, Rikers, & Paas, 2010). Morain and Swarts (2012) likewise reported that good tutorial videos structurally employed highlighting techniques to draw the viewer’s attention to what was relevant whereas average or poor video do so incidentally, or not.

Design Examples.  Figure 13 shows a sequence of five segments from the video “Editing Dimensions and Save Project” which discuss the customization of the video size. There is too much information about alternatives, possibly because the Camtasia Studio 7 “Getting Started” series is designed as a reference guide rather than a tutorial. This makes the task more complex than it should be for first-time users (G6).

Screen Shot 2013-10-09 at 10.30.18 PMPreceding the displayed segments in Figure 13, the narrator has talked to his audience about setting the dimensions for producing and sharing one’s video on a blog, the Web, an iPhone, or an iPod Touch, among other options for sharing. In the end, however, the narrator indicates that he will demonstrate the default option. Segment 1 appears immediately thereafter. The user is, once again, instructed about an alternative, namely custom settings. The numbers for width and height are selected on the screen, but no actions are taken because they are already the correct numbers. Later, in segment 5 the narrator speaks of the “New editing dimensions”. This is odd, because the displayed numbers have stayed as 640 and 360 right from the start, when the Editing Dimension Box was displayed. This sequence of events is at variance with the user’s mental plan (G6.1).

There is a good alternation between user action and system reaction. The narrative informs the user about what can be done. The movements of the cursor reveal that the action is performed, after which the demonstration displays the effect on the interface (G6.2).

There is a brisk pace here, and in the other videos in the “Getting Started” series for Camtasia. We fear that for the novice it may be somewhat too fast, yielding an adverse effect on perception and understanding. One possibility to keep the novice user aboard would be to insert a deliberate pause (for example, in segments 2 and 5). Even a pause as short as 2 seconds might suffice for the user to reflect on the demonstration that has just gone by (Spanjers, Van Gog, & Van Merriënboer, 2012; Spanjers, Wouters, Van Gog, & Van Merriënboer, 2010). After having digested that information, the user would be more ready to attend to the new video instructions.

Segment 5 of Figure 13 shows an apllication of guideline 6.3. Again, there is a fine sequence of images. First, the user gets to see the preview window. Thereafter, the signals draw the user’s attention to the effect of the earlier choice of setting. There is also consistency in how the signals are presented. They always reside clearly on top of the interface, and they are always of the same colour (G6.3).

Screen Shot 2013-10-09 at 10.30.32 PM

Figure 14 illustrates how we applied Guideline 6 to make tasks clear and simple in our instructions on Word’s formatting options. The figure displays three consecutive segments from the video “Improving a list”. The demonstration uses a simple, prototypical example. That is, first, each sentence from the list begins with the target words. Second, the target words are presented in bold to make them stand out, so that the user can easily recognize the set of items that form the list. Third, the target words vary slightly in length. The length of the longest target word is an important feature for aligning the descriptions. By varying the length of the target words only slightly, the user can easily see what such an alignment requires. Fourth, the descriptions are relatively short (that is, two to three sentences). For one thing, this makes it easier to keep the whole list in view.

The narrative gives a precise account of the actions that the user must execute (G6).  Because it is assumed that the user knows the basics of Word, zooming is applied without an explanation because there is no fear that it might disorient the user. Likewise, it is assumed that the user has basic general computer skills, and therefore there is no information on how to move the cursor, or which mouse button to press. In contrast, the user is informed about the time lapse of a few seconds before the pop-up explanation for the icon appears, because it is not self-evident that the user must temporarily do nothing. Wait time information is always important for users. In short, the sequence is designed to fit the user’s mental plan (G6.1).

Screen Shot 2013-10-09 at 10.30.49 PMIn addition, the close connection between user input and system reaction is made in the narrative and through the coupling of narrative and screenshot (G6.2). As before (see Figure 9, segment 2), we have included a deliberate pause. Segment 2 consists of a five second interval where the effect of selecting the list becomes visible. This segment gives the user a little bit of extra time to process that information, and possibly to think about what may come next. Segment 3 includes the joint application of zooming in and highlighting (G6.3). The red circle draws the user’s attention; the zoom in facilitates perception of the (small) icon that the user needs to select.

The guideline to use highlighting to signal screen objects or locations (G6.3) can readily be illustrated with examples from both sets of tutorials. Three signaling techniques are employed in the “Getting Started” series from TechSmith. The use of big blue arrows was already shown in Figure 7. Figure 15 illustrates the other techniques. To signal which sets of tools belong together a thick blue line appears around their perimeter. For dragging, another technique is used. Here a dotted line suggests movement.

In our videos on Word’s formatting options, we used either a red circle or a red arrow for highlighting screen objects and their location (G6.3). The color red and the size and shape of these signals made them stand out sufficiently from the interface. Figure 16 shows both types, illustrating that the signals were just too big to be mistaken for part of the interface.

Guideline 7: Keep Videos Short

Description.  Plaisant and Shneiderman (2005) recommend keeping videos as short as possible. They suggest that a length of between 15 to 60 seconds is optimal for keeping the user engaged and minimizing what needs to be remembered together. Other researchers propose a slightly longer duration. Chan et al. (2010) mention a 3-minute average as “the usual length of a video clip on medical consultation in problem-based learning” (p. 764).

Perhaps the most difficult design issue is to create meaningful videos for tasks that are too long to display in one demonstration (see Spanjers, Van Gog, et al., 2010; Zacks, Speer, Swallow, Braver, & Reynolds, 2007). The designer can use an arbitrary time limit for breaking up a complex task, but this is hardly satisfactory. What matters more is that the user perceives a video as having a clear beginning and end. This generally means that the designer must look for structural changes such as goal or sub-goal completion.

Physical changes on the screen can be meaningful moments for creating segments within a video. However, we prefer to use the deliberate pause for marking these event boundaries (see Figure 9, segment 2, and Figure 14, segment 2).

Support.  The transitory nature of videos can make it hard for the user to perceive them accurately and comprehend their content. Researchers have investigated the possibility of manipulations of temporal characteristics of video. One such temporal factor is segmentation, which can be defined as dividing the stream of information into smaller units with identifiable beginning and end points (Spanjers, Van Gog, et al., 2010). Empirical research shows that segmentation increases learning (e.g., Khacharem, Spanjers, Zoudji, Kalyuga, & Ripoll, 2013; Spanjers, Wouters, et al., 2010; Zacks et al., 2007). The positive effect of segmentation on learning is ascribed to two distinct phenomena: pausing and temporal cueing.

Pausing is done to reduce cognitive overload that may arise from video’s transitory nature. Pausing involves stopping the video at key moments to give the viewer extra time to take in the information that has been presented. One variant of this stop option is user controlled. In that case, the video includes full-stop moments that depend on a user action for video replay or continuation. Another variant involves temporary-pause moments that resume play after a brief automated pause. Empirical research indicates that even short pauses of 2 seconds may suffice to benefit the user (Spanjers, Wouters, et al., 2010).

Screen Shot 2013-10-09 at 10.35.56 PM

Temporal cueing is done to create meaningful boundaries for segments. According to event theory, people perceive and conceive dynamic representations as sets of discrete events (Zacks et al., 2007; Zacks & Tversky, 2003). They naturally break down the continuity of the stream of information of such representations into meaningful moments. The designer can aid the user in making these demarcations. That is, segmentation can decompose a continuous display of images into a limited set of main events that convey the underlying structure or schema. This is probably more effective than relying on the user’s own efforts at constructing such meaningful segments (Spanjers et al., 2012).

In multimedia learning theory, the phenomenon of presenting the user with meaningful and manageable units of information is known as the segmenting principle (Mayer, 2005b). According to this principle, designers should decide which separate videos to divide a tutorial into, so creating sub-tasks, and if these are still relatively long, to break these down into smaller bite-sized segments.

Design Examples.  TechSmith recently changed the presentation of the titles of all of its instructional videos. The new format for the “Getting Started” series for Camtasia Studio 7 was shown earlier in Figure 3. We prefer the original version because the titles are easier to scan, and users can see the duration of the videos. Figure 17 shows that the length of the videos ranges between 2 to 4 minutes, with an average of about 3 minutes.

The original table of contents for our videos on Word’s formatting options does not show the segment length (see Figure 4). Figure 18 gives this information for the regular playing time of each video. It shows that the duration of the average preview and procedural video was just over one minute. In addition, there was a limited range.

The titles signal that each video revolves around a rounded-off task. There is a clear formatting goal and each video displays the whole process from start to finish to achieving this goal. However, when a formatting task threatened to become cumbersome it was divided into meaningful sub-tasks. For example, we separated the task of formatting the margins of a whole document into two separate videos, one on adjusting the right and one on adjusting the left margin. Each subtask made sense independent of the other; their sequencing was chosen so that the simpler task (right margin setting) preceded the more complex one (left margin setting). The task split led to a significant reduction in video length and task complexity. Constructing an automatic table of contents was likewise decomposed into a set of manageable sub-tasks.

Guideline 8: Strengthen Demonstration with Practice

Description.  A classic design approach in education that is recommended for software training as well is the coupling of instruction and practice. During instruction, the problem and the solution processes are explained. During practice, users actively solve problems on their own. Practice serves to consolidate and enhance learning. In addition, it is a self-test for users to see whether they can apply what has been taught. To support practice, users should be given exercises that clearly set the starting condition and end goal for the user. Empirical research indicates that exercises are more effective learning aids than on-your-own sections (Glasbeek, 2004; Wiedenbeck, Zavala, & Nawyn, 2000). Several repetitions of practice are called for when the goal is to compile and automate procedures.

Support.  The value of coupling a recorded demonstration with practice (G8) was demonstrated in an experiment by Ertelt (2007; see also Rieber, 1991). The study found that the opportunity for practice after video instructions significantly improved user performance after training compared to the non-practice control condition. In Ertelt’s study, access to the video instructions was blocked during practice for reasons of experimental control. For video instructions that are publicly available, this is an unrealistic restriction. The study by Shippey et al. (2011) on skills acquisition of medical students shows that users benefit considerably from being able to access the video during practice. The study showed that open access yielded a significant advantage for skills retention in comparison to a situation in which video access was blocked.

The important role of an after-training activity in experiential learning is well-known. Research indicates that learning can be increased significantly and substantially when users reflect on their experiences (Fanning & Gaba, 2007). A prevalent and effective type of stimulus used after the task engagement is debriefing, which can be defined as facilitated or guided reflection. A recent meta-analysis reported an average gain of 20% to 25% with debriefing (Tannenbaum & Cerasoli, 2013).

Screen Shot 2013-10-09 at 10.36.05 PM

Design Examples.  The right column in Figure 19 shows the last slide from our video on “Styling the main headings.” It contains an invitation for practice (G8). During practice, users do not receive any new instructions, but they can look back to the video if necessary (in experiments this option is sometimes blocked). They are invited to try out the instructed skills, informed about the goal they should try to achieve, and told what practice file can be used in that effort. To enhance skill consolidation, the texts in these practice files were structurally identical to the showcased demonstration files.

Having users work with practice files in the exercises after instruction has several advantages. One, users do not need to create a document or other object from scratch. They can open a practice file and immediately start working on the problem it contains. Two, practice files can be optimized for task execution by making them short, simple, and exemplary. Three, practice files can be carefully prepared to address known problems.



In assessing these guidelines for the construction of instructional video the reader is reminded of the fact that we have focused on tutorials. The videos should teach the basics of using a software package. After processing the videos the users should be capable of completing fundamental software tasks without the need for (repeated) help.

Informing the user about the other possibilities of the software with instructional video (referential videos) should likewise concentrate on supporting user actions. But there are at least two important differences with videos serving a tutorial function. One is that a referential video must function merely as a job aid. The user needs to be only instructed about how to achieve a task. No training files or stimuli to learn are needed. Another difference is that a referential video may need to give conceptual information because it is impossible to provide detailed, step by step information about all possible scenarios of use. A design solution that is often chosen for this problem in paper reference guides is to provide annotated displays of all the tools and menus of a program. Such displays resemble glossaries; they are mainly conceptual in nature. Their aim is to capacitate the user with knowledge about what the program has to offer them.

In this respect it is interesting to see the analogue that we found in TechSmith’s video of the Recorder tool (see Figure 6 and 7). We criticized this video segment for informing the user about all the affordances of this tool. What TechSmith intended to achieve with this video segment is trying to move the user out of his comfort zone. It was an attempt to find a “perfect balance between getting good instruction and enough information to create users that not only can do the task, but have context and understanding what else they can do and when to do them” (M. Pierce from TechSmith, personal communication, February 26, 2014). The good thing about this effort is that it attempts to counteract the prevalent problem of software underuse. Another good thing is that this video includes the basic actions for making a video record of what happens on the screen. Yet another noteworthy point is that the elaborate discussion of the Recorder is apt for a referential video on Camtasia Studio because it deals with a pivotal tool. However, we differ in opinion on the value of such a hybrid referential-tutorial video within the context of a Getting Started tutorial.

As mentioned in the introduction, we have empirically tested our set of instructional videos for Word’s formatting options, using these guidelines for their basis. Three consecutive experiments have yielded substantial support for the effectiveness of these instructions versus a paper-based tutorial (Van der Meij & Van der Meij, in preparation, in review). The experiments included different Word versions, audiences and languages. That is, students from the Netherlands received instructions in Dutch, whereas students from Indonesia received instructions in Bahasa. The outcomes of these studies clearly favored the video tutorial over a paper-based version. Both on measures of motivation and indices of procedural skills development we found the video instructions to be more effective.

Among others, the students reported having experienced a stronger flow while working with video. Flow is a pleasant state. It is a signal of the users’ concentration and task absorption (Vollmeyer & Rheinberg, 2006). When a user experiences flow there is an optimal balance between his or her skills and the challenges posed by the task. In addition, there was a finding of a higher increase of self-efficacy belief which indicates that students developed more confidence in their capacity to solve similar tasks. Likewise, skills development during and after training was supported more strongly with video. The students more successfully completed tasks during training. In one study (Van der Meij & Van der Meij, in review) we found that students achieved a 90% success rate during training with the video instructions as opposed to a 63% success rate for the paper-based tutorial. Similar differences between video and paper-based tutorials were found on a post-test and a retention test, signaling that the video instructions led to more learning and retention than did paper-based instructions.

By connecting the eight guidelines to principles, theories and insights from various authors and fields of study, and by also providing design examples, we have tried to identify key issues related to the intriguing nature of instructional videos for software training. The resulting patterns are potentially beneficial for researchers and practitioners.



We would like to thank David Farkas for his comments on an early draft of this paper. We also would like to thank the reviewers for their insightful and constructive comments on an earlier version of this paper.



Alexander, K. P. (2013). The usability of print and online video instructions. Technical Communication Quarterly, 22, 237-259.

Amadieu, F., Mariné, C., & Laimay, C. (2011). The attention-guiding effect and cognitive load in the comprehension of animations. Computers in Human Behavior, 27, 36-40.

Ambard, P. D., & Ambard, L. K. (2012). Effects of narrative script advance organizer strategies used to introduce video in the foreign classroom. Foreign Language Annals, 45(2), 203-228.

Atkinson, R. K., Renkl, A., & Merrill, M. M. (2003). Transitioning from studying examples to solving problems: Effects of self-explanation prompts and fading worked-out steps. Journal of Educational Psychology, 95, 774-783.

Ausubel, D. P. (1968). Educational psychology: A cognitive view. New York, NY: Holt, Rinehart & Winston.

Bannert, M. (2000). The effects of training-wheels and self-learning materials in software training. Journal of Computer Assisted Learning, 16, 336-346.

Baylor, A. L. (2011). The design of motivational agents and avatars. Educational Technology Research & Development, 59, 291-300.

Bethke, F. J., Dean, W. M., Kaiser, P. H., Ort, E., & Pessin, F. H. (1981). Improving the usability of programming publications. IBM Systems Journal, 20, 306-320.

Bovair, S., & Kieras, D. (1991). Toward a model of acquiring procedures from text. In R. Barr, M. L. Kamil, P. Mosenthal & P. D. Pearson (Eds.), Handbook of reading research (pp. 206-229). New York, NY: Longman.

Bransford, J. D., Brown, A. L., & Cocking, R. R. (2002). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press.

Camtasia Studio 7 tutorials. (2013). Retrieved from http://www.techsmith.com/tutorial-camtasia-7.html

Carroll, J. M. (1990). The Nurnberg funnel. Designing minimalist instruction for practical computer skill. Cambridge, MA: MIT Press.

Carroll, J. M., & Carrithers, C. (1984). Training wheels in a user interface. Communications of the ACM, 27, 800-806.

Carroll, J. M., & Farooq, U. (2007). Patterns as a paradigm for theory in community-based learning. Computer-Supported Collaborative Learning, 2, 41-59.

Catrambone, R., & Yuasa, M. (2006). Acquisition of procedures: The effects of example elaborations and active learning exercises. Learning and Instruction, 16, 139-153.

Chan, L. P., Patil, N. G., Chen, J. Y., Lam, J. M., Lau, C. S., & Ip, M. S. M. (2010). Advantages of video trigger in problem-based learning. Medical Teacher, 32, 760-765.

de Koning, B. B., Tabbers, H. K., Rikers, R. M. J. P., & Paas, F. (2010). Learning by generating vs. receiving instructional explanations: Two approaches to enhance attention cueing in animations. Computers & Education, 55, 681-691.

Ertelt, A. (2007). On-screen videos as an effective learning tool. The effect of instructional design variants and practice on learning achievements, retention, transfer, and motivation. (Doctoral dissertation). Albert-Ludwigs Universiteit Freiburg, Germany.

Fanning, R. M., & Gaba, D. M. (2007). The role of debriefing in simulation-based learning. Simulation in Healthcare, 2(2), 115-125.

Farkas, D. K. (1999). The logical and rhetorical construction of procedural discourse. Technical Communication, 46, 42-54.

Gellevij, M. R. M., & Van der Meij, H. (2004). Empirical proof for presenting screen captures in software documentation. Technical Communication, 42(2), 77-91.

Glasbeek, H. (2004). Solving problems on your own: How do exercises in tutorials interact with software learners’ level of goal-orientedness? IEEE Transactions on Professional Communication, 47(1), 44-53.

Gurlitt, J., Dummel, S., Schuster, S., & Nückles, M. (2012). Differently structured advance organizers lead to different initial schemata and learning outcomes. Instructional Science, 40, 351-369.

Hartley, J., & Davies, I. K. (1976). Preinstructional strategies: The role of pretests, behavioral objectives, overviews and advance organizers. Review of Educational Research, 46, 239-265.

Höffler, T. N., & Leutner, D. (2007). Instructional animation versus static pictures: A meta-analysis. Learning and Instruction, 17, 722-738.

Keller, J. M. (2010). Motivational design for learning and performance. The ARCS-Model approach. New York, NY: Springer.

Kennedy, G. E. (2004). Promoting cognition in multimedia interactivity research. Journal of Interactive Learning Research, 15, 43-61.

Khacharem, A., Spanjers, I. A. E., Zoudji, B., Kalyuga, S., & Ripoll, H. (2013). Using segmentation to support learning from animated soccer scenes: An effect of prior knowledge. Psychology of Sports and Exercise, 14, 154-160.

Kriz, W. C. (2011). A systemic-constructivist approach to the facilitation and debriefing of simulations and games. Simulation & Gaming, 41(5), 663-680.

Lederman, L. C. (1992). Debriefing: Toward a systematic assessment of theory and practice. . Simulation & Gaming, 23(2), 145-160.

Leutner, D. (2000). Double-fading support: A training approach to complex software systems. Journal of Computer Assisted Learning, 16, 347-357.

Linek, S. B., Gerjets, P., & Scheiter, K. (2010). The speaker/gender effect: Does the speaker’s gender matter when presenting auditory text in multimedia messages? Instructional Science, 38, 503-521.

Lloyd, S. A., & Robertson, C. L. (2012). Screencast tutorials enhance student learning of statistics. Teaching of Psychology, 39(1), 67-71.

Mayer, R. E. (2001). Multimedia Learning. Cambridge, NY: Cambridge University Press.

Mayer, R. E. (2003a). Learning and instruction. Upper Saddle River, NJ: Prentice Hall.

Mayer, R. E. (2003b). The promise of multimedia learning: Using the same instructional design methods across different media. Learning and Instruction, 13, 125-139.

Mayer, R. E. (2005a). The Cambridge handbook of multimedia learning. Cambridge, NY: Cambridge University Press.

Mayer, R. E. (2005b). Principles for managing essential processing in multimedia learning: Segmenting, pretraining, and modality principles. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 169-182). Cambridge, NY: Cambridge University Press.

Mayer, R. E. (2005c). Principles for reducing extraneous processing in multimedia learning: Coherence, signaling, redundancy, spatial contiguity and temporal contiguity principles. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 183-200). Cambridge, NY: Cambridge University Press.

Mayer, R. E. (2005d). Principles of multimedia learning on social cues: Personalization, voice and image principles. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 201-212). Cambridge, NY: Cambridge University Press.

Mayer, R. E., & Chandler, P. (2001). When learning is just a click away: Does simple user interaction foster deeper understanding of multimedia messages? Journal of Educational Psychology, 93, 390-973.

Mayer, R. E., Fennell, S., Farmer, L., & Campbell, J. (2004). A personalization effect in multimedia learning: Students learn better when words are in conversational style rather than formal style. Journal of Educational Psychology, 95, 389-395.

Merkt, M., Weigand, S., Heier, A., & Schwan, S. (2011). Learning with videos vs. learning with print: The role of interactive features. Learning and Instruction, 21, 687-704.

Mestre, L. S. (2012). Student preference for tutorial design: A usability study. Reference Service Review, 40(2), 258-276.

Morain, M., & Swarts, J. (2012). YouTutorial: A framework for assessing instructional online video. Technical Communication Quarterly, 21, 6-24. doi: 10.1080/10572252.2012.626690

Moreno, R., & Mayer, R. E. (1999). Cognitive principles of multimedia learning: The role of modality and contiguity. Journal of Educational Psychology, 91(2), 358-368.

Moreno, R., & Mayer, R. E. (2000). Engaging students in active learning: The case for personalized multimedia messages. Journal of Educational Psychology, 92(4), 724-733.

Moreno, R., & Mayer, R. E. (2004). Personalized messages that promote science learning in virtual environments. Journal of Educational Psychology, 96(1), 165-173.

Palmiter, S. (1993). The effectiveness of animated demonstrations for computer-based tasks: A summary, model and future research. Journal of Visual Language and Computing, 4, 71-89.

Plaisant, C., & Shneiderman, B. (2005). Show me! Guidelines for recorded demonstration. Paper presented at the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05), Dallas, Texas. Retrieved from http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2005-02

Rieber, L. P. (1991). Animation, incidental learning, and continuing motivation. Journal of Educational Psychology, 83(3), 318-328.

Salomon, G. (1984). Television is “easy” and print is “tough”: The differential investment of mental effort in learning as a function of perceptions and attributions. Journal of Educational Psychology, 76(4), 647-658.

Schwan, S., & Riempp, R. (2004). The cognitive benefit of interactive videos: Learning to tie nautical knots. Learning and Instruction, 14, 293-305.

Schworm, S., & Renkl, A. (2007). Learning argumentation skills through the use of prompts for self-explaining examples. Journal of Educational Psychology, 99, 285-296.

Shippey, S. H., Chen, T. L., Chou, B., Knoepp, L. R., Bowen, C. W., & Handa, V. L. (2011). Teaching subcuticular suturing to medical students: Video versus expert instructor feedback. Journal of Surgical Education, 68(5), 397-402.

Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153-189.

Spanjers, I. A. E., Van Gog, T., & Van Merriënboer, J. J. G. (2010). A theoretical analysis of how segmentation of dynamic visualizations optimizes students’ learning. Educational Psychology Review, 22, 411-423.

Spanjers, I. A. E., Van Gog, T., & Van Merriënboer, J. J. G. (2012). Segmentation of worked examples: Effects on cognitive laod and learning. Applied Cognitive Psychology, 26, 352-358.

Spanjers, I. A. E., Wouters, P., Van Gog, T., & Van Merriënboer, J. J. G. (2010). An expertise reversal effect of segmentation in learning from animated worked-out examples. Computers in Human Behavior, 27, 46-52.

Swarts, J. (2012). New modes of help: Best practices for instructional video. Technical Communication, 59(3), 195-206.

Tabbers, H. K., & De Koeijer, B. (2010). Learner control in animated multimedia instructions. Instructional Science, 38, 441-453.

Tannenbaum, S. I., & Cerasoli, C. P. (2013). Do team and individual-debriefs enhance performance? A meta-analysis. Human Factors, 55(1), 231-245.

Tversky, B., Bauer-Morrison, J., & Betrancourt, M. (2002). Animation: Can it facilitate? International Journal of Human-Computer Studies, 57, 247-262.

Van der Meij, H. (2000). The role and design of screen images in software documentation. Journal of Computer Assisted Learning, 16, 294-306.

Van der Meij, H., Blijleven, P., & Jansen, L. (2003). What makes up a procedure? In M. J. Albers & B. Mazur (Eds.), Content & Complexity. Information design in technical communication (pp. 129-186). Mahwah, NJ: Erlbaum.

Van der Meij, H., & Carroll, J. M. (1998). Principles and heuristics for designing minimalist instruction. In J. M. Carroll (Ed.), Minimalism beyond the Nurnberg funnel Cambridge, Mass: MIT Press.

Van der Meij, H., & Gellevij, M. R. M. (1998). Screen captures in software documentation. Technical Communication, 45(4), 529-543.

Van der Meij, H., & Gellevij, M. R. M. (2004). The four components of procedures. IEEE Transactions on Professional Communication, 47(1), 5-14.

Van der Meij, H., Karreman, J., & Steehouder, M. (2009). Three decades of research and professional practice on software tutorials for novices. Technical Communication, 56(3), 265-292.

Van der Meij, H., & Van der Meij, J. (in preparation). Paper-based and video tutorials for software learning compared for their effects on student motivation, learning and retention.

Van der Meij, H., & Van der Meij, J. (in review). A comparison of paper-based and video tutorials for software learning.

Van Merriënboer, J. J. G., Kirschner, P. A., & Kester, L. (2003). Taking the load off a learner’s mind: Instructional design for complex learning. Educational Psychologist, 38(1), 5-13.

Vollmeyer, R., & Rheinberg, F. (2006). Motivational effects on self-regulated learning with different tasks. Educational Psychology Review, 18, 239-253.

Wiedenbeck, S., Zavala, J. A., & Nawyn, J. (2000). An activity-based analysis of hands-on practice methods. Journal of Computer Assisted Learning 16, 358-365.

Wouters, P., Tabbers, H. K., & Paas, F. (2007). Interactivity in video-based models. Educational Psychology Review, 19, 327-342.

Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S., & Reynolds, J. R. (2007). Event perception: A mind-brain perspective. Psychological Bulletin, 133(2), 273-293.

Zacks, J. M., & Tversky, B. (2003). Structuring information interfaces for procedural learning. Journal of Experimental Psychology: Applied, 9(2), 88-100.


About the Authors

Hans van der Meij is senior researcher and lecturer in instructional technology at the University of Twente (The Netherlands). His research interests are questioning, technical documentation (for example, instructional design, minimalism, the development of self-study materials), and the functional integration of ICT in education. He received several awards for his articles, including a “Landmark Paper” award by IEEE for a publication on minimalism (with John Carroll). Hans van der Meij is the corresponding author. He can be contacted by readers who would like to have access to the research materials employed in the experimental studies on the eight guidelines. E-mail: H.vanderMeij@utwente.nl

Jan van der Meij is assistant professor at ELAN Institute for Teacher Education and Science Communication. He received his PhD in 2007. His research concentrates on functional use of ICT in education. This includes learning with (multiple) dynamic representations, (live) video instruction, and cognitive and motivational support by pedagogical agents. Web site: www.utwente.nl/elan/medewerkers/meij.doc. Contact: J.vanderMeij@utwente.nl

Manuscript received 14 February 2013; revised 31 July 2013; accepted 3 August 2013.