A few items came up in the review of T188321 that should be addressed, but are separate (and smaller) issues than the big fixes made there:
- refactor \b in regexes into a $wordBoundary variable so that it is easy to do something smarter and more location aware in the future (once we figure out what that is)
- add some new exceptions that came up from last-minute review of examples in Tatar transliteration, plus some more proper names
possibly figure out what to do about roman numerals. The last patch ignores roman numerals as long as they are not one letter long and followed by a period (that is, as long as it doesn't look like an initial). Possibilities include:stop trying to be clever and ignore roman numerals entirely, letting editors explicitly -{mark them}- as not to be transliteratedonly automatically block roman numerals that are two-letters or longer which really cuts down on false positivesstick with the current system
(I'm happy with any of the roman numeral options—we just have to decide which one is the one we want.)