25/06/2025
Terminology database: from uncontrolled growth to valuable resource in five steps
A well-maintained terminology database is the foundation for consistent and cost-efficient corporate communication. But in practice, it’s often a different story: databases that have grown over the years, merged lists from different departments and inconsistent or missing standards lead to a proliferation of data that is difficult to utilise. Therefore, cleaning up these databases systematically pays off directly in terms of time and cost savings in text creation, translation and communication.
The challenge: Confusion in terminology databases
Terminology management plays a key role in the company, as a well-maintained terminology database ensures that technical terms are used consistently across all departments, documents and languages. It serves as a central reference for text creation, is integrated into authoring systems and translation tools and forms the basis for high-quality machine translations.
However, as soon as terminology management begins, more and more entries tend to be added. Different databases are merged, and new terms are extracted from documents and defined during product development. This results in uncontrolled growth, especially if standards are only established late in the process.
The four main causes of unclean terminology data:
- The database takes over: More and more language data is continually accumulated.
- Lack of clean-up routines: The active collection of data is not accompanied by regularly checking and cleaning up the data.
- Inconsistent procedures: There are no uniform standards for filling and maintaining the database.
- Several editors: Different people, departments or service providers work on the database, which can mean that no one has a clear picture of the whole thing.
The five-step approach to systematically cleaning up your data
To counteract these causes, terminology databases should be cleaned up in a structured way. A funnel approach has proved successful here: the quantity is first reduced by eliminating superfluous entries, and then the details are worked on.
A top-down funnel approach to cleaning up terminology (source: oneword GmbH)
Level 1: Reducing the volume of data by analysing the database
Therefore, the first and most important step looks at the number of entries. Analyses from our own projects show that, in some databases, only 20 to 30 per cent of the terminology is active. The proportion of inactive terminology is often alarmingly high, especially in the case of large databases with over 5,000 entries.
Therefore, it makes sense to perform a database analysis to determine what proportion of the database is active. The aim of this analysis is to separate the terminology actually used from unused information. To do this, we compare your terminology database with current corporate texts – for example, with document collections from various divisions and departments or with translation memories. For each term from the database, we check whether and how often it occurs in the reference texts. The analysis not only shows which terms are actively used, but by determining the frequency, also makes it possible to prioritise the database for subsequent steps, such as definition or glossary creation.
For small databases, this analysis can be carried out manually or using the search function. However, extensive databases should be analysed using scripts. The result also identifies inactive terminology, i.e. terms that do not appear in the text corpus and are therefore not used in the company. As there is usually a great deal of scepticism around deleting data once it has been created, it is advisable to mark the terms or entries as “inactive” as a first step. This enables active and inactive data to be displayed separately and specifically filtered. If a term is then used after all, it can be reactivated by removing the label. After a defined period of time, inactive entries can and should be permanently deleted.
Level 2: Cleaning up the structure to assign information clearly
The structure of a terminology database must provide a unique location for each piece of information. There are separate entry, language and term levels where fields and information are added.
The following questions can be asked to check the structure:
- Completeness: Is there a suitable field for every piece of information required?
- Clarity: Is it clear which information belongs in which field?
- Level assignment: Have fields been created at the correct level?
- Data categories: Have the correct field types been selected (free text, drop-down list, multimedia)?
- Naming: Are the field names clear and understandable?
A real-life example: A “Usage” field with the values “preferred” and “approved” mixes two different types of information, as “approved” refers to the approval status of a term and not to its usage. To solve this, an additional “Approval status” field with relevant selection values (approved, to be checked) is required.
Level 3: Cleaning up metadata to make the content traceable
After the structural clean-up, the metadata – i.e. the specific content of the fields – is checked. According to DIN ISO 26162-1:2020-05, choosing the correct data category is crucial for using a database appropriately.
You should only use free text fields to add variable content, such as definitions, comments or example sentences. Drop-down lists are the better choice for all fields with a limited number of possible values, such as usage or domain, as they prevent a field from filling up with lots of different variants. Multimedia fields allow images, videos and audio files to be integrated, while yes/no fields are suitable for binary values.
Cleaning up metadata not only improves findability and traceability: it also has a positive effect on subsequent processes. Glossary creation for machine translation benefits in particular from clear usage information.
Level 4: Cleaning up the form of the data for technical consistency
The form of the data may seem like a minor detail, but when used in authoring support or translation tools, it can quickly add up to a considerable amount of review work. Typical adjustments to the form include, for example:
- Upper and lower case
- Use of plural forms
- Special forms (e.g. small caps)
- Adding brackets within terms
- Use of hyphens
- Spaces and special characters
Since cleaning up the form of terms involves a limited number of sources of error, it is easy to systematise the checks. Filters within the database can help to quickly identify affected entries. The correction itself can usually be done in batches – i.e. for many entries at the same time instead of individually. Some clean-up steps can even be carried out across several languages in one go. Excel formulas, for example, are suitable for correcting upper and lower case letters, avoiding manual corrections. For company-specific features and a quick overview of clean-up requirements, scripts are available that can be expanded at any time and deliver quick results, even for large databases.
Level 5: Cleaning up content for clarity of meaning
Cleaning up content is often on the borderline between form and semantics. For example, while standardised hyphenation rules for product names relate to the form, deciding on hyphenation for loanwords requires content-related considerations.
Checking for consistency is a key part of cleaning up the content. All terms with the same word components should be written and used consistently. This check often results in a list of different spellings, which then have to be standardised. Identifying synonyms and defining a preferred term are also part of the content clean-up.
An important tool here is the duplicate check. It shows multiple occurrences within a language and helps to decide whether they are actually different concepts or whether entries need to be merged. For multilingual databases, a duplicate check in the foreign languages reveals where there are synonyms in the source language or where necessary differentiations are missing in the foreign languages.
Measurable success: The benefits of clean terminology
Systematically cleaning up terminology data quickly provides concrete improvements. Efficiency, quality and costs can all be optimised in the process.
In terms of increased efficiency, you benefit from significantly reduced search times for translators and authors. New employees can familiarise themselves more quickly with the organisation’s terminology and there is a noticeable reduction in the need to query and agree on particular terms.
The improved quality is clear through consistent terminology across all documents. The error rate in translations is reduced and machine translation (MT) output in particular is significantly improved.
These optimisations lead to measurable cost savings, as fewer correction loops mean less effort, optimised MTPE processes (machine translation + post-editing) reduce the time taken in post-editing and the ongoing effort to maintain the database is significantly reduced.
Investing in clean terminology pays off, especially in the context of modern translation workflows. Integration into systems such as our oneSuite makes it possible to utilise the full potential of cleaned-up terminology for consistent, efficient translation processes.
Conclusion: An investment with great benefits
Cleaning up terminology data is an investment that measurably pays off. A structured approach across the five steps described – quantity, structure, metadata, form and content – turns the complex task into a manageable process.
This ensures noticeable efficiency gains: faster working, fewer coordination loops and better translation quality – whether through a human or machine. Clean terminology therefore signifies tangible cost savings and, at the same time, more consistent quality in corporate communication. Maintained terminology data therefore fulfils its key role in the company – and cleaning it up is a profitable investment.
Would you like to fully exploit the potential of your terminology database?
Our experts at oneword will be happy to carry out a professional database analysis or analyse the potential for cleaning up your database with our oneCleanup service. Contact us today.
8 good reasons to choose oneword.
Learn more about what we do and what sets us apart from traditional translation agencies.
We explain 8 good reasons and more to choose oneword for a successful partnership.