There are two known problems with the current pattern matching:
- I implemented part of the Crimean Tatar (crh) Language Converter transliteration incorrectly, with the large list of exceptions treated as whole words to be matched, not patterns to match against partial words. The partial matching is necessary to deal with large number of inflections in the language. (See T186727#3998090.)
- There are some very complex regexes that I converted from assuming they were running on the full text to running on individual tokens/words. They don't work quite right. (See T186727#3998090 again.)
Another option is to find a more complex but more efficient implementation that still operates on tokens/words.
Choosing between those options for (1) will determine how best to proceed for (2), and there may need to be a significant re-architecture of the transliteration as a result.