02/12/2025

Glossary creation at the touch of a button?

With the help of glossaries – i.e. bilingual lists of approved terminology – generic MT and AI system output can be adapted to company-specific specialised terminology. Our internal analyses show that terminology corrections account for up to 45% of post-editing changes. This post-editing effort can therefore be significantly reduced with clear and appropriate specified terminology.

Clean data as a basis

A terminology database serves as the perfect basis for glossary creation, as it ideally contains technical terms and their equivalents. To make this information machine-readable, a 1:1 mapping between the source and target language terms must be carried out. This in turn requires clean data and metadata so that a preferred term in one language is assigned to the preferred term in another language. In addition, there may be ambiguities – for example, the same word being used for different concepts – and possibly also simply too few entries that do not completely cover the actual terminology of the company.

Some MT providers now offer automated glossary creation from bilingual files or even completely AI-generated glossaries. While the latter provide at most a slight improvement on other generic output, as there is no company-specific input, we took a closer look at machine glossary creation.

Human vs. machine in comparison

As with our tests for AI-supported term extraction, we again compared human and machine. While the machine-generated glossary of the MT provider delivered 54 pairs of terms for German and English, the manual creation of 138 pairs was 2.5 times as many. In purely linguistic terms, the MT result scored well: all German terms are listed in their basic grammatical form and with the correct English equivalent.

However, a comparison of the two results showed that only about half (= 28 terms) of the machine-generated result matched the human result. The remaining 26 terms from the MT result were not relevant to the glossary as they were general terms or placeholders. The result of 28 relevant terms thus corresponds to only 20% of the 138 human-extracted terms – a clear loss in quantity, as only one in five terms was included.

Best practice: professional support in the form of database analysis

As glossaries are usually created once and then only need to be added to or reduced, professional support for the initial creation is particularly valuable. All terminology sources such as databases and reference texts are taken into account. With the help of a database analysis, the database content that is relevant to a glossary can be identified in advance so that only this content is specified for the machine translation and AI systems. Sources of potential terminology problems such as ambiguities are also checked and implemented in a glossary-compatible manner.

With our semi-automated glossary creation, we support companies on the way to effective glossaries that are not created at the touch of a button, but are nevertheless created quickly and efficiently thanks to a high degree of automation. Because, just as with AI translation, human expertise also counts in glossary creation in order to make the process effective.

Would you like to create glossaries for MT and AI? Our experts at oneword will be happy to help you analyse and create effective glossaries. Contact us for an initial consultation.

8 good reasons to choose oneword.

Learn more about what we do and what sets us apart from traditional translation agencies.

We explain 8 good reasons and more to choose oneword for a successful partnership.

Request a quotation

    I agree that oneword GmbH may contact me and store the data that I provide.