Steps to reproduce:
- Make any change to any page that includes a non-ASCII symbol, and the saved version will have distorted characters
- A rollback that produces weird text
- Immediately after saving, the text showed correctly (containing "åäö"), but purging the cache (with ?action=purge) shows distorted text ("Ã¥Ã¤Ã¶")
- Immediately after saving, the text showed correctly, but purging the cache (with ?action=purge) shows distorted text
Background: The old Revision::getRevisionText() and the new BlobStore::expandBlob() methods apply the legacy encoding if no flags are provided - the "utf-8" flag is required to bypass this conversion.
As part of the refactoring for MCR, the code for constructing a Revision from an array was consolidated with the code for constructing from a row object. Row objects are required to have the old_flags field set; this field being null or empty would trigger legacy encoding conversion. The same logic was now applied for the 'flags' field when constructing from an array - which was a mistake. No conversion (or indeed decompression or other kinds of decoding) should be applied when constructing from arrays.
This mistake led to the legacy encoding conversion to be applied whenever a Revision object was constructed from an array - which is the case whenever a new Revision is prepared for insertion into the database while saving an edit. This caused data corruption by double-encoding.
Do not apply any processing to the content blob when constructing a Revision from an array (at least not for the normal case of the 'flags' field not being set).