Invariant failed: Bad UTF-8 (full string verification)
The bad UTF-8 can also be seen directly in the source for:
This task is forked from T237467, which ended up being an issue with Language::commafy generating bad UTF-8. In contrast, in this task the bad UTF-8 is coming directly from the DB. As described in T237467#6566785, we need the following mitigations:
- Bad UTF-8 is not supposed to make it past PST to get stored in the DB in the first place. So we need to track down how it got in there and clean it up; also perhaps cleaning up other articles that managed to get saved with bad UTF-8.
- Fix core to plug this hole so that bad UTF-8 is not stored in the DB.
- Validate wikitext source we get from the DB and fix up bad UTF-8 we get, downgrading this from a crasher to a warning. (The assertion is still appropriate if we encounter bad UTF-8 later, since that would be generated by Parsoid from valid inputs; but Parsoid operates under the assumption that all of its inputs are valid.)