By Jean-Luc Saillard
Globalization: Creative Destruction in the Translations Industry
Continued development in global communications requires faster information exchange, which is not possible without the corresponding engineering support. This relates not only to mass communications, such as cell phones or the Internet, but also to technologies for translating content into other languages, the most advanced developments in the translation industry.
Companies might seek to set up a branch in a foreign country, with almost all countries open for foreign business. Here is what TAUS International Organization, a translations industry think tank, has to say: “We think translation is becoming a utility. What this means is that translation is becoming something similar to electricity, Internet or water.” The machines have learned to translate “on the fly” and they do that by using machine translation (MT) technology.
Today, translation is needed in all areas—documents, websites, applications, spoken or written communications—a failure for partners to touch base would put their collaboration at risk.

And it’s not only communications that need more and more translation, but the scope of content that has to be translated is also growing. This means that companies need to translate more, while maintaining their speed. The management processes that are used in the translation industry are far from perfect and can benefit from improvements. After we saw this, we decided to develop a solution that would help both translators and everyone else dealing with inter-language communications. When we started marketing the product, we understood that it had to be free and available to anybody.
Automation Translation Technologies: Past, Present, and Future
The first automated solutions started to appear in the early 1990s. The systems were based on a core component—translation memory—that enables the accumulation of translations performed earlier and the reuse of any repetitions. When the system identifies a repetition, a user can just copy the translation from the database. Besides translation memory, CAT tools (computer aided technologies) feature a number of other functions, such as machine translation (MT), glossaries, automated spelling checks, and more.
Cloud-based services started to take ground across many industries a long time ago, but the first non-desktop professional translation solutions appeared only in 2012-2013.
SmartCAT has become one of the first fully cloud-based CAT systems worldwide. Earlier systems were based on client-server architecture: translators would work in desktop-installed programs, while a company-controlled server was used to migrate translation resources (translation memories and glossaries) and manage projects. This architecture is hard to roll out and support, while it also has limited scaling capabilities. At the same time, the price tag for such systems is quite high and they are hard to get around. And our developers are continuously working to make all SmartCAT interfaces as convenient as possible: to do that, we leverage usability tracking techniques and user tests.
SmartCAT is the only CAT tool that uses morphology search both for source and target texts. This search function is a much more complex option as compared to a conventional search function, while the program includes special word-form dictionaries that are created based on changes logic. This enables the system to find a word, irrespective of the word form (whenever you search the word “search,” the system will also show results for other word forms, such as “searches,” “searching,” etc.). We also work hard to deliver the most effective machine translation functionality. Let’s take a more detailed look at machine translation.
The efforts to develop a machine translation system originally started as early as the 1950s, setting an ambitious goal to completely replace professional translators. Now, after 50 years, the industry players believe that machine translation, when used professionally, should help translators cut the time they spend on translations rather than replace translators altogether. In this way, the focus of professionally used machine translation has shifted to effective integration within the translator environment, adapting application scenarios and post-MT editing. The focus has shifted from mere editing of machine translation to having a number of automated suggestions and changing a longer phrase in just a couple of clicks, etc.
Some of the engines we use can automatically apply terms, stored in customer glossaries and translation memory. In addition, machine translation engines can save the formatting. For instance, whenever any word is in bold or has hyperlink in the source, the system would identify the respective fragments in the translation and retain formatting, so the translator does not have to spend time on these actions.
Below is a step-by-step description of the process to execute a standard translation project.
Step 1
The manager or freelance translator (depending on whether it’s a corporate or freelance user) creates a project and uploads files to be translated. At this stage, one of the more complex modules is used—file disassembly module. The file disassembly module is used to “read” a file and select the textual information, while any data concerning tagging, formatting, and element locations in the file will be handled at the next stages.
At the same time, the obtained text is divided into individual sentences, as translation memory is used at the level of a single sentence. The translation editor also presents the text, divided into individual sentences. The editor contains a table with sentences from the text, presented in a consecutive manner.
After the translation has been completed, the file disassembly module assembles the final document version, using the translation and file tagging information log, created at the disassembly stage. The processed files may feature lots of formatting and a complex structure, which makes disassembly processes extremely complex both in terms of algorithms used and system workload.
Step 2
After the file has been disassembled, but before translation assignments have been created and work on the translation commences, the system gathers the statistics data for the uploaded documents. At this stage, each sentence is compared against the translation memory bases: the system searches for any repetitions between earlier translations and the text being processed, identifying what are also known as matches. The system enables not only a search for matches, but it also determines the degree, as a percentage, to which sentences match. The manager can use these data to calculate time needed and the costs for the project; the more matches are there, the less time and money a project will require. These data enable the manager to build an optimal team of translators and editors to carry out the project.
In order to identify any matches, the system uses a special module, known as the full-text search module, which is one of the most important and complex components of the system. This module is also used for glossary term searches in the text and certain automated verifications.
Step 3
At the next stage, the assignments are distributed. The manager decides on the team members (translators, editors and proofreaders) and then assigns particular documents to them within the project. If a document is too large, the manager can assign several members to a single document. If the manager has already set up a basic team for the project, this won’t pose any problem. However, this is not always the case.
Step 4
After the manager has decided on all team members and agreed all details with them, the translation can commence. As we have said before, the editor interface is a table with each sentence presented in an individual segment. The translator needs to translate these segments, using the automation tools that we have described.
During the translation, each sentence undergoes automatic verification. When the system has not identified any problems, the translator can save the segment. The translation is simultaneously saved to the final document version and translation memory base, which enables the subsequent reuse of the translation. Then the translator moves on to the next sentence. When the translator has moved on to a new segment, the system searches for any matches between the active segment and translation memory, uploads machine translation and searches for glossary terms.
So, the translation of each sentence results in a number of queries to the database and full-text search engine, increasing system workload. This is particularly important when a file is handled by a team and not just one translator. The stakeholders engaged in a project—translators, editors, proofreaders, managers—all need to interact continuously with each other.
Whenever the system is used by a team, you can easily see who has been assigned what documents and what changes have been made. In addition, SmartCAT has a so-called anti-conflict system: while one translator is working on a segment, it is blocked and nobody else can make changes, meaning there is no risk that others will interfere with a translation process.
The system automatically calculates the number of words translated and progress for individual translators. In this way, the team is always aware of the time needed to complete the project and can take timely decisions in order to enable it to complete the project on time.
Teamwork: Challenges and Solutions
One of the challenges faced by teams that need to handle large data volumes, is how to set up joint efforts to use translation resources (translation memory and glossaries). All CAT programs seek to address this need in one way or another. In desktop-server systems, the server is used to migrate resources: the resources are stored on the server, while translators need to connect to the server from their desktop applications in order to access the updated resources. Arranging this process in such a way as to cover all translators can be hard and costly for a company. Using a cloud-based CAT tool, all team members work within the same system, which solves these issues and makes their work easier and more effective.
The second challenge is about how to effectively arrange communications within a team. Team members must always be able to consult on translation, sort out issues and take joint decisions. Cloud-based solutions are a totally different ball game in this regard, and SmartCAT developers work hard to enable these functionalities in the most effective and convenient way. One such function is simultaneous translation and editing, enabled in SmartCAT. Generally, the editor takes on a file only after a translator (or a team of translators) has completely finished work on the translation. In SmartCAT, the editor can start working after the translator has completed just a few sentences. This “combo” enables the editor, who is more experienced in the respective field of expertise, to pass on comments to the translator as regards certain mistakes, so that the translator can avoid them down the line. In this way, working time can be significantly decreased, as the editor does not have to correct those mistakes.
The third challenge is how to handle so-called conflict situations and cases, when several translators make changes to the same sentence. This is how a standard procedure appears for desktop-server systems: a translator downloads a document from the server to their desktop application, completes the translation and uploads the file back to the server. In some cases, while a translator is uploading their translation to the server, they can see that another translation has already been uploaded for the same segment. This is a conflict situation, requiring a choice of several alternative translations. Additionally, in this scenario two translators would spend time translating the same segment. In contrast, cloud-based systems enable changes to be saved in real time after a sentence has been saved. However, not all CAT systems assume that several translators can do this at the same time, which will result in a conflict. In SmartCAT, all changes are immediately saved and saved segments are blocked, which prevents such conflicts.
Clouds for Translators
When developing the system, the hardest and also the most exciting challenge for us was how we could fully use the potential of the cloud and not just reproduce the desktop system scenario in a browser. These are, for instance, the abilities to use the translation resources (translation memory, machine translation, digital dictionaries and glossaries), simultaneously and in real time, and apply machine learning to better analyze source data.
Of course, you need to have good hardware to ensure due performance of such a complex system, but, more importantly, the system program code should be optimal and economical, making sure that each operation uses as few resources as possible. If the optimization is not done right, even the easiest operations and queries can overwhelm the system, resulting in a drop in performance.
In addition, enabling effective capabilities for teamwork has always been a core challenge for us. SmartCAT, as we point out above, gives all team members a chance to collaborate and interact as closely as they need to, while also ensuring any changes you make are saved instantaneously.
Developers always look to deliver the most optimal environment for such complex teamwork. For instance, a translator might “find and replace” a word, which is an operation that would require the processing of every segment of the text: the system will need to find all occurrences of this word and replace them accordingly. When there’s only one translator working on the document, this is not difficult. However, for teamwork, the system needs to ensure that each individual segment has not been blocked by another translator, then block such segment, make changes and unblock it. At the same time, the system needs to inform all translators who are working on the document as to the changes made, so that they can instantaneously see these changes.
API Integration
The system should have flexible API capabilities that would enable it to roll out various integration scenarios. One of the most popular API capabilities is integration with content management systems in order to automatically send translation assignments. The customer’s information system, for instance, website or Web store, can easily connect to SmartCAT, send any product description texts or other information and then get it back.
This approach enables the full automation of a process to exchange assignment with no need to involve managers. The majority of services now frequently update the content, but in extremely small portions. If such small pieces of information are exchanged via email rather than API, the operations might take more time than the translation itself. This is what makes the integration capabilities so beneficial.
Another capability is the so-called HotFolder: a capability to integrate SmartCAT and a folder, located on a user’s desktop computer. When a user places any file in the folder, it is automatically uploaded to SmartCAT, while the completed translations are placed back to the same folder next to the source files. Besides uploading the new files, this function may be conveniently used to handle revised documents: whenever you upload a new version of an existing, previously translated file, SmartCAT will identify only those portions that have been changed and present only them for translation. This capability is particularly useful for software and games developers. When a company releases a new program version, this approach enables the quick translation of any amended portions of interfaces or documents.
Future: Marketplace and Disintermediation
At SmartCAT, we apply great effort, seeking to develop an ecosystem for independent services and solutions providers on the platform. We will continue our work to improve the search functionality: in addition to standard filters by parameter and rating, we plan to roll out a smart search function that enables analysis of a text and selection of team members who have the most appropriate experience.
This doesn’t mean that some customers will be able to access the data of other customers, as the comparison will be done automatically and users won’t be able to access the information being analyzed. The customer won’t have to manually assign a document subject or select team members; the system will automatically select specialists based on information on successfully completed projects in the respective field and for the same language pair. The function, enabling the search for documents on a similar subject, requires the use of a number of text analysis algorithms. Previously, no other CAT tools have enabled a real-time system with data harmonization. This is the patented technology of our company.
The second area of development is automated interactions with translators. We are currently testing algorithms, which would enable automated team creation, making sure that a customer or manager benefits from the reduction in time needed to manage the project and that time and costs are optimized as much as possible. The system will analyze the scope and complexity of an order, deadlines and budgets. Based on the obtained data, the system will determine the number and qualifications of team members, send out notifications to them and invite them to take part in the project. Within the project, the system will also be able to monitor progress and deadlines, “push” team members or notify the customer or manager as to any issues that arise. In order to fully implement all these functions, we will need to work hard, but we are convinced that we will succeed.
The application of innovative technology to select team members and automatically build up the optimal translation process for each project enables a drastic reduction in the barriers for independent translation arrangement and decreases the time requirements for translation projects, as a result, ensuring a guaranteed level of translation quality, independently, easily and at a drastically lower cost than before.
In addition, there is a number of other technology solutions that we are currently testing internally and planning to subsequently roll out as part of our system. Among such functions, for instance, is voice input and control technology, enabling system control and dictating the translation by voice. These will be mainly used in the mobile version of SmartCAT that is also currently under development: it’s not as easy to type in a text on a smartphone or tablet. This program version will also feature various types of suggests and cues; the system will cue the user as to the most appropriate phrase, based on the subject of the text and available translation resources.
Some of these solutions will be available as optional modules for SmartCAT, and we are even planning to establish an ecosystem for the development of such modules in order to enable third-party developers to make improvements to the product.
The marketplace is a module that enables the connection of customers and providers. This is an enormous database of freelance translators (currently including over 50 thousand translators). In order to get around such an enormous scope of data and select those translators that are the best fit, the system enables a search of translators by various parameters.
The function enabling a search across the translators database makes it possible to process a wide range of queries. One of the most popular queries is to find a native English speaker for editing: this parameter (mother tongue or non-mother tongue for an individual translator), as many others, can be used to search across the translators database. The translators registered on the SmartCAT platform currently cover 1,000 language pairs.
When you decide to recruit a particular translator for a project, with whom you work for the first time, you always need to verify their qualifications. The translator might have specified, at their own initiative, the availability of international certificates and portfolio, they might also pass our test, and in this case you will be able to review and assess the results or base your decisions on the findings of our specialists (who verify the test results in an extremely rigorous manner). The last option might be useful for those who need to arrange a translation to Brazilian Portuguese, although they don’t speak this language, which makes it hard to assess the translation.
We have already started to test the system for automated selection of translators for texts from our customers. Based on the subject of the text, the average translator’s speed, budget and scheduled deadline (if a deadline is specified), the system will search for the translators (editors, proofreaders) who have already successfully completed projects in similar fields and for the same language pair. This is the patented technology of our company.
For the managers who are responsible for translation processes arranged within a company (including language service providers), the completely automated team recruitment process will help save about 20% of their work time.
JEAN-LUC SAILLARD is the COO of SmartCAT. He has over 20 years of management experience in the translation industry. His passion for technology has led him to be an early adopter of translation tools from the DOS-based TM tools to the latest workflow and MT applications.