Using Regular Expressions (REGEX) to speed up QA in Studio
By Duncan, M.A. (Hons.), German to English translator
17th October 2016
This article is about speeding up QA in Studio, and reducing effort: something I bet every translator wants to do! Studio's QA functions are very versatile, but I sometimes find it useful to filter source or target segments so I can check a particular term, the number of tabs in a set of segments, the presence of spaces before a specific character or string such as "C" or "mm", etc. If you ever enter part of a term in the text box in the Review tab, in Studio, and press Enter, you are already filtering segments. Using a "regex" (the abbreviation for "regular expression") is just another way to do that, and this article contains a few examples of what you can do with it. The possibilities are endless! The only difference to simply entering some or all of an actual term is that some regex expressions create more sophisticated filters for selecting just the right segments for you to look at.
A "regular expression" is a set of characters (which can be letters, numbers, or operators) used to define a search pattern. There are many places you can find more info, but this is a good start: https://en.wikipedia.org/wiki/Regular_expression. If you research regex expressions, you will find they can identify highly complex strings in the texts you are looking at, such as "at least two spaces occurring after a period (.) and before an upper case letter".
I use this search pattern (which I called a "regex expression" below, even though the "ex" means "expression" – because "a regex" sounds strange to me!) in Studio to select segments for checking in particular ways.
Where to find out more about regular expressions
Regex "cheat sheets" are useful – but regex has its own (computer language-specific) dialects now, so not every regex expression will necessarily work in Studio: you'll just have to try it out!
Look for "regular expression cheat sheet" in the web!
How do I use a regex expression in Studio?
1. With your current sdlxliff(s) open in Studio, open the Preview tab. In the middle of the main tool bar, you will see a text box with an "In Source" icon (by default) to the left of it.
2. Select the "In Source" icon to filter segments in the source text, or click the down arrow by that icon and select "In Target" icon to filter segments in the target text.
3. Type a regex expression in the text box (called the "filter box" in the table below), or paste it in, if you have a list of them, and press Enter.
Studio will filter the segments in your text according to your selection criteria, and display the relevant segments for you to check.
Tip: make sure you don't add any extra spaces or anything else to a regex expression, otherwise it's likely not to work, or work in a very unexpected way! Also, case (capitalisation) matters!
I recommend you keep a list in a plain text file, if you like using these expressions!
Some useful regex expressions
The table below contains some regex expressions that I use. In some cases, I found them on the web, in others I was told them by someone, and some I extrapolated from the ones I had already.
They barely touch the surface of what is possible, but try them out – I hope you find them useful! (You can also search for relevant segments one at a time with a regex expression in the Search dialog (Ctrl-F) by the way: just select the "Use:" checkbox and select "Regular expressions" in the drop-down list below it).
Any new regex expressions?
It might be fun to see what other regex expressions are being used: perhaps we could share them?