There are two known problems with the current pattern matching:
- I implemented part of the Crimean Tatar (crh) Language Converter transliteration incorrectly, with the large list of exceptions treated as whole words to be matched, not patterns to match against partial words. The partial matching is necessary to deal with large number of inflections in the language. (See T186727#3998090.)
- There are some very complex regexes that I converted from assuming they were running on the full text to running on individual tokens/words. They don't work quite right. (See T186727#3998090 again.)
One option for (1) is to change the implementation to be more like the Javascript implementation, which runs all the regexes over the full text to be transliterated. However, there are a lot of regexes (3500+) and they run on multiple strings to render a page.
Another option is to find a more complex but more efficient implementation that still operates on tokens/words.
Choosing between those options for (1) will determine how best to proceed for (2), and there may need to be a significant re-architecture of the transliteration as a result.