Columns

Interview with Jaap Van Der Meer

By Scott Abel

In the digital age, change happens quickly. This column features interviews with the movers and shakers—the folks behind new ideas, standards, methods, products, and amazing technologies that are changing the way we live and interact in our modern world. Got questions, suggestions or feedback? Email them to scottabel@mac.com.

In this exclusive interview for Intercom, Scott Abel, The Content Wrangler, chats with machine translation expert, Jaap Van Der Meer, about why he believes machine translation will soon become a ubiquitous service—a utility embedded in every app, website, and product.

The Content Wrangler: What is TAUS?

Jaap Van Der Meer: TAUS (www.taus.net) is the Translation Automation User Society, an industry organization for requesters and providers of translation services and technologies. TAUS currently has around 150 members varying from the large global IT and manufacturing companies to translation agencies from around the world. The common interest of all these members is to advance innovation and automation in the translation industry.

TCW: What is machine translation?

JVDM: The translation industry is quite used to translation memory tools, like TRADOS [the household name in that sector]. Translation Memory (TM) supports human translators by storing translated sentences and offering these segments for reuse if the same or a similar segment pops up in the source document (i.e., the document to be translated). Machine Translation (MT) is different from TM because it translates complete documents fully automatic, including all these segments that are not stored in a TM database.

MT has somewhat of a bad reputation because it has not always performed well. Very often the output from MT makes people laugh or get irritated. In the last couple of years, MT technology has made tremendous strides, especially into the consumer market. Much to the surprise of the professional translation industry. End-users do not seem to be bothered that much about the sometimes stuttering and funny sentences.

Google Translate and Microsoft Bing get hundreds of millions of translation requests every day. After more than half a century of research MT is now finally making it into the real world. To be sure, the technology has improved a lot over time, but a bigger factor perhaps causing the rapid adoption of MT is the fact that the volume of content and the number of languages we publish to keeps rising, and the type of content is changing, too.

Not everything we write and read is important enough to require human translations, or at least one could say that a lot of the information we consume does not have a shelf life long enough to warrant the expense.

TCW: Are there different types or flavors of machine translation? If so, can you tell us a little about them (what they are, pros and cons, what they’re most useful for)?

JVDM: From a research perspective a distinction is made between two approaches: rule-based and statistical-based (or data-driven) machine translation.

The rule-based approach is the oldest and is based on the way everyone learns language at school—that is, combining the knowledge of grammar and dictionaries. The statistical (or data-driven) approach came up in the early nineties as a complete opposite model: ignore rules and dictionaries, just feed the computer with lots of sentences and it will learn and learn better and more if we feed it more sentences, both bilingual sentence pairs (like translation memories) and monolingual sentences (in the target language).

From a user perspective these different approaches do not matter than much. The MT developers these days are experimenting with mixing these approaches and are more and more inclined to build “hybrid” systems.

A big advantage of the statistical approach is that engines can be developed and improved faster and easier for new language pairs and domains, as long as translation data are available. Linguists and translators tend to have a preference for rule-based systems because they appear to have a greater sense of control over improvements to the engines by editing glossaries.

Statistical engines usually behave much less predictable. Improving statistical engines is much more a matter of experimenting with lots of data and tools. Another distinction worth mentioning is the one between commercial systems and open source systems. There are not too many commercial MT players active in the market. At the same time, many service and solution providers take on an open source MT system and build a service around it. One of these open source MT systems is called Moses. Moses is gaining popularity among large government bodies, corporations, and language service providers.

TCW: What is the current state of translation automation? What is realistically possible? What are some companies doing today?

JVDM: There is a clear role for MT in the business world and in the world of governments. There is no doubt about that. What is important, though, is to start with a content profiling exercise: for each type of content being written and translated, decide whether a lesser quality level would negatively impact the brand or image of the company, or whether speed of delivery and the function of the content is perhaps more critical.

TAUS has set up a simple content profiling wizard on the website, based on what we call a UTS scoring. U stands for utility, T for timeliness or speed, and S for sentiment. Content types with a high U and T score lend themselves much better for the use of MT.

In fact, MT technology is now quickly becoming indispensable for content that needs to be translated quickly and functionally. Following the incredible popularity of Google Translate and Microsoft Translate among end-users, many corporations now start to offer a customized (or non-customized) MT service on their customer support website.

This is real-time translation, and it works. Customers give positive feedback: they are happy to be able to find critical information in their own language—even if it is not always as perfect as we might like.

Today, more and more companies are looking at MT as a productivity tool in existing translation processes. This means that they need to integrate the MT technology with the already existing set-up with TM (translation memory) and translation workflow. This is challenging, not only from a tools integration perspective, but also because translators and translation vendors may be less enthusiastic about MT for obvious reasons. However, the benefits of MT have been proven sufficiently. Productivity rises and cost savings resulting from the introduction of MT range from 20% to 100% or even more.

TCW: Conversely, what is still not possible? What is unrealistic?

JVDM: Don’t try to machine-translate poetry or metaphorical marketing text, although some people try that as well. Also, we are struggling with MT for many new languages. MT technology is available for 50 to 60 languages, and there are many combinations (language pairs) that can be developed from this basic set.

For many companies becoming more global and seeking growth in new markets in the world, MT is the ideal—perhaps the only—practical solution. The people of Earth speak 6,000 languages. Many global companies offer content in roughly 30 languages, and with that language coverage they reach potentially 1 billion users in the world. To reach the next few billion users around one thousand new languages need to be added. Companies—no matter how large—do not have the budget or the resources to translate into 1,000 languages using traditional human-based translation processes.

So, everything is technically possible, but we are all challenged now to develop MT engines for many more language pairs. In order to do that, we need translation data in many new languages. That is the constraint we are all facing.

TCW: You talk a lot about the eventual ubiquity of translation automation. What does this future look like? Can you tell us why you think that way and what we might expect if you’re right?

JVDM: TAUS sees translation as a utility, like electricity, water, and the Internet. Translation is a basic human right. Every citizen in the world has the right to access information in his or her own language. It is a natural next phase in the evolution of hyperglobalization the world has gone through in the past two decades. It is a natural next step in the evolution of the technology as well.

We see MT improving rapidly, especially since it is being used more and more, and since we have been able to train the engines with in-domain translation data. We see the translation industry growing and flourishing as never before.

There is also a growing awareness that we all (as industry partners) need to collaborate on a grand scale to make the Internet accessible for everyone on the planet, as evidenced by the recent Internet.org initiative announced by Facebook and some other global corporations. My only concern though, when it comes to sharing translation data, is that we are held back by outdated copyright law and, as a result, we will not make as much progress as we could make.

TCW: Can writers impact the success of translation automation systems by making decisions about how they will craft their content? If so, what types of things can we do to better prepare our content for its eventual processing by a machine translation system? Suggestions?

JVDM: Well, it is obvious that spelling errors can confuse the MT engines, just like ungrammatical or unusually long sentences. But that is not different from the human for whom the text is written. So writers should do what they are always asked to do: make no spelling mistakes, write clear complete sentences, and don’t make them too long.

TCW: What does TAUS do? What is your mission and who do you serve?

JVDM: TAUS supports entrepreneurs and globalization and content managers with a comprehensive range of services, like research, tutorials, training, translation-quality evaluation and benchmarking, technology directories, knowledge bases, an industry-standard API for connecting content with translation, and a repository of shared translation memory data. The TAUS mission is to increase the size and significance of the translation sector to help the world communicate better.

TCW: How can our readers learn more about machine translation?

JVDM: Visit www.taus.net and find out all of our free resources: articles and free reports, for instance, the recent Translation Technology Landscape report, a seventy-page report with a complete overview of the translation technology landscape.

TCW: What parting advice do you have for technical communication pros who know that preparing their content for translation is important work?

JVDM: Be open to MT technology. It helps you and your company to vastly expand the reach of your content and to grow your business.