25/01/2022

MTPE: Tailor-made or off the peg? How specifications ensure that machine translation fits

Our expert for translation management, Nicole Sixdorf, recently asked herself out loud what is the decisive factor when it comes to professional and appropriate corporate use of machine translation – and gave a lecture on this for the professional users of the tekom industry association. She has now summarised these findings in an insightful article. In it, Sixdorf explains that the right specifications are needed, what these are – and what this has to do with the comfort of a winter coat.

It has to fit…like a winter coat

A coat? Exactly, but not just any. Because there is the winter coat that has been at your side faithfully for years. You know where you stand, it fits and keeps you warm – even if it takes five minutes for the zip to catch. This coat is our human translation.
And then there’s this new coat that you want to get so that the zip closes well and you can get out the door even when it’s minus 15 degrees. But, there are hundreds of versions of this coat on the market, and you have to choose the right one first. This is our machine translation.

The coat analogy illustrates well what the decision is about, because the new coat supplements the selection. Choosing it does not mean that you will never wear your old one again. So if you want to use machine translation, that doesn’t mean you can’t use human translation. Rather, one or the other is used depending on the weather conditions – or according to need and specifications.

The pros and cons of the decision concern only the choice of the new coat. And to do that, you first take a look at the weather forecast. Figuratively speaking: before making a decision on whether or not to use machine translation, it is important to consider what exactly it will be used for. This is where the specifications come into play.

When you buy a new coat, you also consider what size you need, how expensive it can be, how many pockets and what else it should have; a hood, for example. It is no different with MT. The process begins with a check of the specifications, which identifies the initial situation and translation needs in the company as well as the starting point for time- and cost-effective machine translation.

Helpful questions to clarify the need for translation:

Which texts are to be translated? Are the texts from a specific field, e.g. technical texts?
What is the text quality of the source texts? Is the spelling, grammar etc. correct?
Are they continuous texts?
Are the texts well formatted?
How large is the volume?
Which language combinations are needed?
Are there budget targets? For example, should costs only be saved for a certain type of text because this accounts for the largest volume?
Are there time limitations due to tight follow-up deadlines?
How are the machine-translated texts going to be used? Externally? Or only internally?
What are the requirements with regard to data protection and data security?
Is there a certain level of risk to be considered? For example, risks relating to image damage, personal injury, property damage?
Is there any specific terminology?
Are style guides available? Or formatting guidelines?

These criteria largely determine the feasibility of machine translation. They determine the choice of MT system and are the guidelines for the so-called post-editing, or the post-editing of machine translations. But, one thing at a time!

Know what you are talking about: Terminology

Another key question on this topic: when it comes to machine translation, does it matter whether companies have a terminology database? To cut a long story short once again, yes.

If you want to train your own machine for translation, a well-maintained terminology database is essential for subsequent output quality. With off-the-shelf machine translation, terminology also plays an important role for consistency in post-editing. If confronted with a “huge terminology database” in the preliminary consideration, it is advisable to begin by taking a closer look rather than discarding the idea of MT. After all, outdated terms, duplicates, redundantly defined or inconsistent terms not only affect quality, but also the potential for savings – incidentally, for human translation as well.

If we return to the coat analogy, we are now prepared for the purchase and can approach the question purposefully. Is an off-the-shelf coat sufficient, or will you have one custom-made?

Know what works: Training

The analogy of “customised or off-the-shelf” relates to the rough distinction between generic “untrained” machines and customised “trained” machines. To understand what this really means, let’s take a look at the finer details.

The term “untrained” is used here to describe machines that have been trained with data for specific language combinations, but just with less specialised text corpora. Well-known providers include DeepL and Google Translate (link only available in German). Domain-specific engines are another form of these generic machines. These engines have also been trained for specific language combinations, but are more selective in their training data and specialise in subject areas such as mechanical engineering, software or law.

“Trained” machines can be understood as company-specific or customised engines. They are trained directly using bilingual data from the company. In this case, specifications contribute significantly to the success of the project.

One advantage of trained machines is that company specifications are adhered to. If, for example, a certain style or specifications exist that detail how texts are to be translated, the machine implements this because has been trained using texts that have previously been translated in the same form for the company. How well the machine implements this depends on the training material and on whether the specifications were also adhered to in the previous translations.

The most important factor here is terminology. This is because technical vocabulary and company-specific vocabulary can only be correctly integrated by trained machines. Training is the only way to ensure that it is recognised correctly by the machine and correspondingly transferred correctly and consistently.

With a trained machine, the style corresponds to the style of the training materials or the previous translations.

Depending on the weather and the occasion: Context

Whether off-the-shelf or customised, both approaches have one thing in common: no MT system has yet managed to translate technical texts completely error-free and consistently. Every machine is only as good as its training material. Even a trained system is therefore no guarantee for continuously high quality output. They reach their limits, for example, when you want to have texts from other fields translated.

However, since the entry into machine translation for many companies usually begins with generic systems, we take a look at why clearly documented specifications such as style guides and terminology are crucial for success here as well.

Meanwhile, DeepL offers a glossary function that allows you to specify how certain words should be translated. This also works quite well, as the machine directly adapts the terms linguistically and can, for example, correctly reproduce the plural and singular. However, the context is different. The following example (screenshot) illustrates the term “mother”, where the glossary function is more of a disadvantage because the context is ignored.

Another important specification here is the output quality of the text given to the machine – with dots and dashes. How decisive these details can be is exemplified by the French word “arrêté”, which denotes a decree or order. However, if the accent on the last e is missing in the source text, it becomes the word “Arrête”, which means, for example, “to stop”. As a result (screenshot), the machine suddenly can’t take a joke any more because it can’t correctly establish the crucial reference to the rest of the text.

Common classic machine faults include:

Errors in content, misinterpretations
Context errors ☝?
Omission of existing text
Additions or supplementary text not included in the source text
Terminology errors, e.g. proper names that are not recognised or inconsistent use of terms
Incorrect references between parts of sentences or consecutive sentences*
Tag errors, e.g. formatting errors such as bold print
Poor punctuation

*In short: the machine only ever looks at the context up to the full stop. I’m sorry to say that it doesn’t care what was in the sentence before or after.

So the translation never comes out of the machine perfect and error-free. The next key question is therefore: how can you secure the cost and time advantages and still obtain a machine translation of equal quality to a human translation that meets your own specifications and requirements?

This is where the essential factor for appropriate corporate, professional machine translation comes into play – and MT becomes MTPE: Machine Translation + Post-Editing. The aforementioned post-editing, i.e. the post-editing of the machine (pre-)translation by qualified post-editors.

In our winter coat analogy, this would be the point at which the coat is taken to a tailor. But at this stage, you don’t yet know if it fits you.

Trying it on to see if it fits: Feasibility analysis

Not every document, language combination or content is suitable for machine pre-translation. The more specific and individual the translation requirements are, the more critical it is to consider whether MTPE can still achieve any cost and time benefits at all.

The machine accepts any text, of course, but only outputs it in the translation in the way it can “interpret” it. The result can be almost one hundred percent correct or completely wrong – as the typical machine errors show. In such cases, it can be faster and cheaper to have texts translated directly by human translators.

The decisive factor is always and for every project the feasibility, i.e. the individual analysis of all aspects that have an influence on the suitability for MTPE: from languages, subject areas and research intensity to content and sentence structures. It is important to decide in context(!) and detail which texts are suitable and which adjustments can be made to optimise the process. The latter can be, for example, well-maintained terminology databases or hybrid solutions – for example, the joint use of machine translation and translation memory so that texts that have already been translated do not have to be translated and adapted again by the machine.

Then it’s off to the tailor: Post-editing

Sticking with the coat analogy because the image of the tailor fits: our post-editors go to work as required and either tailor the machine translation off the peg or get the individual cut of the company’s own machine into shape. Trained translators and linguists have an extensive understanding of the functions and objectives of machine translation as well as comprehensive linguistic and technical expertise. They know the individual specifications and specialised terminology and they always have the economic efficiency of the project in mind, because – as in the case of tailors – their motto is: as few changes as possible with the greatest possible effect!

Conclusion: If it fits and is correct, it’s the right choice

As when choosing a new winter coat, it is also important to know your own quality requirements and to record them in the form of corresponding specifications when deciding on machine translation. These specifications determine both the choice of MT system and the general feasibility of MTPE projects. And they are important guidelines when post-editing machine texts, because there is no need to simply accept what the machine throws out.

Specifications should not only be defined in purely linguistic and technical terms, but should also be extended to associated areas such as data protection and security; the cheapest or free solution is not always the most sensible one. Especially when texts contain sensitive data, for example, or less suitable texts are to be machine-translated and there would be a high rate of changes during post-editing. Whether for the best possible translation result in terms of quality, time and cost, the decision for MTPE or human translation must ultimately be made on a case-by-case basis.

In order to achieve feasible translation results that “fit like a glove”, a potential collaboration should always start with a consultation by experts in the field. This includes performing an initial analysis and taking stock of the situation, but above all it focuses on the specific expectations and individual goals – the specifications. This is something we happy to help you with at any time of year.

8 good reasons to choose oneword.

Learn more about what we do and what sets us apart from traditional translation agencies.

We explain 8 good reasons and more to choose oneword for a successful partnership.

Explore reasons