Page MenuHomePhabricator

Work out what to do with ConcatenatedGzipHistoryBlob in our future
Open, Needs TriagePublic

Description

Following on from T323236: PHP Warning: Class RawMessage has no unserializer, where Message was serialized and stored in the Database... ConcatenatedGzipHistoryBlob is documented as

WARNING: Objects of this class are serialized and permanently stored in the DB.

https://wiki.php.net/rfc/phase_out_serializable

In PHP 9.0 the Serializable interface will be removed and unserialize() will reject payloads using the C serialization format. Code needing to support both PHP < 7.4 and PHP >= 9.0 may polyfill the Serializable interface, though it will have no effect on serialization.

At some point in the future (though, ideally before it potentially breaks), we should work out a migration/mitigation plan, as we do call serialize() on ConcatenatedGzipHistoryBlob objects.

Event Timeline

Tim made an analysis last month at T299387#8398630 as part of T299387: Database corruption due to compressOld array plus bug, April 2006, which included a major refactor and improvement of maintenance/moveToExternal.php. This iteration reduced the number of different kinds of formats for text storage mainly to eliminate non-utf8 ("legacy encoding") blobs and non-external storage (still in the "curr" table, because yes, that table still exists and is queried in prod).

A future iteration could further reduce the number of distinct storage formats we use in prod. If we feel it adds burden to core to keep around (something I'm not convinced of at this time) we could even have a maintenance script run as part of update.php to convert some formats to preferred ones and after 2 LTSes (per T259771) remove those classes.

@Reedy wrote:

In PHP 9.0 the Serializable interface will be removed and unserialize() will reject payloads using the C serialization format. Code needing to support both PHP < 7.4 and PHP >= 9.0 may polyfill the Serializable interface, though it will have no effect on serialization.

At some point in the future (though, ideally before it potentially breaks), we should work out a migration/mitigation plan, as we do call serialize() on ConcatenatedGzipHistoryBlob objects.

I'm not sure if the the removal of the Serializable interface actually affects MediaWiki text storage. Codesearch yields no results for _(un)serial in Blob-related classes in core. I may be missing something but afaik we don't use the "C" serialization format because we use neither the PHP5-era Serialize interface nor the PHP7+ __unserialize() magic methods. Using a contrived example at https://3v4l.org/qkMVF, using a class, subclass and sleep/wakeup methods, PHP returns the exact same "O" storage string from PHP 5.0.2 (2004) through to PHP 8.2.0.