Validate and normalize file contents in FFS
- Only accept valid UTF-8. Further work could be done to convert
things on the fly, but it is unclear whether that extra complexity
is needed right now and worth the effort. Further work is probably
needed to use better exceptions (MWException is being deprecated)
and handle those exceptions appropriately.
- Normalize the input to the standard MediaWiki Unicode normalization
which is NFC. There is probably a small (unmeasured) performance penalty
here, but that should be negligible because:
- parsing should only happen when updating group definitions (known issues exist)
- we are normalizing the whole file before parsing it, not individual messages
This should prevent any kind of unexpected issues with search,
translation memory, insertables, no-change diffs and many other things.