Automating the Production of Software Documentation Videos: An Interview with Mark Hellinger

By Scott Abel | STC Associate Fellow

An increasing number of technical communication organizations produce video documentation, instructional videos, explainer videos, and product simulations and demonstrations. Video production has increased dramatically since 2017, when only 11% of those surveyed by The Content Wrangler said they created video documentation. More recently, 50% of technical communication teams surveyed in 2020 say they produce video but only for some of their products.

Producing video can be problematic—not all tech comm teams have the experience and the tools required to do it well—and the cost can be prohibitive. The frequency of content updates, the cost of localization and translation, and the need to produce documentation deliverables at scale exacerbate the challenges. Technology is beginning to provide us with solutions that can help us overcome these and other video-making challenges.

In this installment of Meet the Change Agents, I introduce you to Mark Hellinger, Chief Marketing Officer at software maker, Videate. The company aims to make ubiquitous and straightforward the automatic assembly of video content from software product documentation.

SA: Thanks for agreeing to share your insights on the topic of automating the generation of technical documentation video assets at scale, Mark. To help provide some context for this interview, can you tell our audience who you are and what your connection to the technical communication industry is?

MH: I am the Chief Marketing Officer for Videate. We have built a platform to automate the production of software videos using AI and robotic process automation. I’ve been working in the technical communication field for over 15 years, and I’m excited to have the opportunity to share a little about our product, Videate, a software product that allows you to automagically make technical documentation videos.

SA: According to Cisco, by 2022, video content will make up more than 82% of all consumer internet traffic—15 times more than it was in 2017! And while those statistics are impressive, consider that 59% of executives say they prefer watching a video over reading text.

There are many reasons the automated production of video from product documentation is attractive to organizations battling for their prospects’ attention and their customers’ loyalty. Setting aside the obvious benefit of providing technical documentation content to those who require it in multiple formats, what are some of the other drivers that spur the automatic generation of video from software documentation adoption?

MH: The number one reason for providing video documentation is customer choice. Some people prefer to read documentation, and others prefer to watch videos. Video preference has increased by over 50%, and there are clear use cases for customer education, technical support, marketing, and client success. Video is now a critical resource in a customer journey—from acquisition to retention.

SA: Videos have a much higher click-through rate than does text content. Can adding instructional video alongside text-based documentation impact how search engines rank our content?

MH: Buyers have become conditioned to watch videos as they research products, especially executive decision-makers for B2B software.

End-users want video. Studies consistently show that video support improves the user experience. Video influences click-through rates and impacts the length of time people spend on a webpage. Videos help lower bounce rate—the percentage of website visitors who navigate away from your content without much interaction—one of the primary challenges associated with providing only textual content.

Generating videos from your documentation allows you to align text and videos, increasing content consumption and improving your SEO ranking.

SA: In the spring of 2020, I interviewed Wouter Maagdenberg of TXTOMedia, about his company’s approach to producing video documentation at scale using structured content. Wouter mentioned that his firm saw much potential in helping companies build at-scale, instructional videos for physical products, pointing out that video documentation is most useful in assisting customers in repairing, assembling, cleaning, and maintaining tangible goods. That makes sense to me, given the nature of the products and the needs of the consumers.

But your company is working to help companies that produce intangible goods, like Software-as-a-Service (SaaS) products, to automate video production from documentation at scale. Why would a SaaS company want to generate videos from technical documentation?

MH: The challenge with SaaS products is different from physical products. Software development is agile, meaning continuous change. Keeping technical documentation up to date is already challenging with bi-weekly, monthly, and quarterly releases. If you make videos, they, too, need to stay up to date with each release.

Up-to-date product information is no longer limited to being distributed after someone purchases a product. It’s also essential in the decision making that occurs when consumers consider what products to purchase. With B2B software, out-of-date video content is a liability. Frustration and confusion occur when consumers encounter video content that fails to align with the product they are evaluating.

If your product information does not match your software, you have a marketing problem. Videos are quickly becoming more important than documentation. If you can consistently produce videos with each new release, you can provide your marketing organization with very high value in their eyes. When technical communication teams build the capability to deliver up-to-date videos of value to the marketing team consistently, they dramatically increase their value across the entire organization.

As the buyer’s journey extends from prospects to paying customers, videos impact customer retention and loyalty. Users need up-to-date educational and technical support videos. Both documentation and videos are vital for customer acquisition and retention.

Making videos is time-consuming, people-intensive, and potentially expensive. Technical documentation teams, especially those who have implemented structured content approaches, are in a unique place to create a scalable and sustainable video production process with newly available technology.

SA: Clearly, there are many business reasons for producing technical documentation videos. Now that you’ve helped us understand how keeping videos up to date can benefit organizations, can you talk a bit about how your solution works?

MH: Videate is an application that discovers what’s inside your existing product documentation. We use machine learning and automation to gain an understanding of your software. We learn where individual elements of a software product’s graphic user interface (like icons or menu items) are located. We use this knowledge to automatically generate video documentation that, for example, instructs a user to click on a particular icon or explains how to fill in a form.

Here’s how it works. We use your text to navigate your software. We synchronize the movement as if you were moving the mouse and speaking the words using a screen-recording tool. You can add video-only animation instructions to your documents to further enrich the experience.

Simultaneously, we use text-to-speech technology from leading providers like Amazon, Microsoft, and Google to generate the voice. It’s synchronized with the on-screen movement as we record. The process results in the automated production of software videos. Our customers often generate video documentation in mp4 format, but we support several other video formats, including playlists.

With Videate, there’s no post-production processing to edit out pauses, stammers, breathing, noise, or errors. We generate videos from the text-based files you are already creating. As you update your existing content, the video versions are updated as well. Using this approach, whenever you release software, you’ll be able to provide up-to-date video documentation, even when you are busy making last-minute user interface changes.

Documentation teams that produce semantically rich, intelligent content have a huge advantage over those that don’t. Feeding our system with intelligent content reduces the amount of work required to “train the engine” significantly, without content rework. We can automatically capture screenshots as we are making the videos, eliminating work for the documentation team. We’ve had customers justify the cost of the platform on this alone.

SA: Automation is a requirement for organizations looking to increase efficiency and grow exponentially. Do you encounter people who don’t believe that automated video production can yield quality video products?

MH: Our north star is what we call the “Video Turing Test.” In 1950, Alan Turing evaluated natural language conversations between humans and computers using text-only responses. If a human evaluator could not tell the difference after five minutes of interaction, the computer passed the test. Seventy years later, the text-only Turing Test makes way for computer-generated speech.

We’re used to hearing Alexa, Siri, and Google Assistant in our daily lives. And yet, when it comes to using a computer-generated voice in software videos, there is skepticism that users will find it acceptable. We asked a wide range of B2B software end-users this question, “Given the choice of having up-to-date software videos with computer-generated voices or out-of-date software videos with human voices, which would you prefer?” The preference: Always up-to-date videos with computer voices.

We have seen text-to-speech technology improve by orders of magnitude every few months, and we believe Videate now passes the Video Turing Test. Personalized voice is currently cost-prohibitive (except for larger enterprises), but it will be soon become much more affordable. The auto-generated voice has come of age.

SA: Besides structured content and the right technology, what is required to make top-quality automatically generated video documentation?

MH: As Alfred Hitchcock said, “To make a great film, you need three things—the script, the script, and the script.” If you have quality product documentation, you can produce quality videos automatically. Garbage in, garbage out; it all starts with the input.

If you record videos manually, you still need to start with a script. While a few people can make one-off videos, most software videos require collaboration with subject matter experts and increasingly with product management and brand voice teams. So to get quality videos, you always need quality scripts, and they need to be updated every time there is a new software release.

Many technical documentation teams understand how to write, follow terminology standards, and succinctly communicate information. Maybe a Ridley Scott quote would have been better. He said, “Once you crack the script, everything else follows.”

SA: For companies that frequently update their technical communication content, how does your solution help them stay on schedule and produce video content on time?

MH: Every time you update your documentation, you must update your videos. Add a new form field or change the name of your product (or a component within it), and you have to update your docs. Technical communication shops that follow a single-source content development model understand how to efficiently do these types of updates. They do it when they push updated renditions of their content to multiple channels, like PDF, HTML, and online help.

Most documentation includes images. Image management is an expensive and time-consuming problem, particularly if you produce video documentation to augment text-based user assistance. Every time you change a product or its graphic user interface, chances are good you’ll also need to create new (or update existing) screenshots. This task can be made more challenging for products that support multiple languages (you have to take screenshots optimized for each language).

Determining which screenshots to update, which ones to capture, and which ones to replace are labor-intensive and expensive. You can’t republish the content until you tackle all these image challenges.

Creating high-quality video documentation requires us to manage images efficiently and effectively. Videate eliminates the need to edit videos by hand. Our machine learning-powered system reduces the cost and manual labor associated with making new screenshots. This approach not only increases productivity and speeds time-to-market, but it also reduces other difficulties introduced during editing, like voice synchronization issues.

SA: Many technical communication shops produce content in multiple languages. How does your approach impact the translation of videos?

MH: There are two common approaches to translate videos. The first is to dub the audio from the original language into another language. This often leads to synchronization challenges worse than watching a movie when the actors’ voices and lips are out of sync. Phrases in foreign languages are not the same length as your source language, and technical jargon is difficult (and sometimes impossible) to translate.

The second is to re-record the video in each language you support, but this is not a scalable approach. The frequency of releases and the manual effort necessary makes this approach cost-prohibitive.

Because Videate relies on a script (your existing software documentation, for example) to automate video production, you can apply the tools and methods you already use to translate your technical documentation. When you translate the script, we regenerate the software video in the cloud by automatically re-synchronizing with your source content. Each time you update the script, Videate automatically creates a new version of the video.

SA: I’ve heard you say that you’re not just repacking existing content into a new format, but that you’re using machine learning to help make our content better. What do you mean by this?

MH: Without giving away our patent-pending secrets, we have a couple of tools to drive machine learning. We use natural language understanding (NLU) and robotic process automation (RPA) to analyze, infer, and generate human behavior. We learn about your software and read and understand your documentation.

Our engine generates behavior—filling out forms, selecting pulldowns, ticking checkboxes—acting on these things in the way humans do. Over time, as your corpus of documents grows, our engine learns and gets smarter.

We are working with some multinational customers who document complex software, and the model we use is improving every day as a result. New clients get the benefit of the thousands of documents we have analyzed to produce videos.

SA: Let’s talk a bit about cost. Wouldn’t it be cheaper to produce videos overseas using a low-cost outsourcing agency?

MH: Outsourcing video production doesn’t save you time or money. Whatever money you think you can save by going offshore, you will pay in terms of quality or post-production time. With Videate, there are no mistakes. You don’t have to edit the videos to remove noise. You always record with the best equipment.

Our pricing considers the actual costs of manually producing video, both on-shore and off-shore. Our consumption-based model means you pay for the number of hours of video you make. You can render as often as you like but only pay for the videos you complete.

The entry point is US $2,000 a month with an annual contract. At worst, it’s break-even if you have one release a year. If you have two or more releases a year, the ROI is compelling. And as a bonus, you get automated screenshots to use in your other documentation deliverables. It scales as your video production scales.

SA: How can a technical documentation team discover whether they can create video-on-demand at scale using your solution?

MH: We’re not quite at the point of full, self service, yet, but we offer a free proof-of-concept. The process is simple. We sign a nondisclosure agreement, you provide some sample documents and credentials to a test or trial environment, and we make videos for you. We review them with you and iterate to generate videos that look and sound great. Then, we show you how to do it yourself.

SA: I’m afraid we’ve run out of time. Thanks for taking the time to explain your approach to producing automated video documentation at scale. It’s a great use of machine learning-enabled technical communication. I’m certain you are onto something big.

MH: Thank you very much, Scott! We’ve solved a problem in need of a solution, and we are getting better at it every day. One final quote, “Any sufficiently advanced technology is indistinguishable from magic.” – Clarke’s Third Law.