We need to ensure that UTF-8 input is:
- Valid UTF-8 (strip broken chars)
- Valid for XML output (strip illegal control characters)
- In sensible normalization (form C)
In some cases we may need to normalize on output as well, due to old
data being corrupt. Or, we can do a one-time pass on the database
to clean it up.