Currently ParserCache stores PHP-serialized instances of ParserOutput objects. Since we don't control serialization/deserialization logic, we are very limited in changing the internal structure of the ParserOutput (or whatever else we will be storing in ParserCache) without bumping the VERSION and invalidating the cache. This will limit us during Parsoid integration work, but having the ParserCache store JSON will allow us to have custom code executed on serialization/deserialization and make modifications to the format of the stored entities without invalidating the whole cache.
Instead, we propose to change the serialization format into JSON. RESTBase has been storing Parsoid output in JSON for years, so we have at least some baseline idea that the data would fit the format.
The plan:
- Anything stored in the ParserCache (ParserOutput at this point, more things later on) will be required to implement JsonSerializable
- Current idea is to have ParserCache serialize the value and enhance the output with type metadata, like "@deserializer": "ParserOutput::createFromSerializedData" and pass it into the underlying BagOfStuff. Alternatively, we could implement a JsonSerializingBagOfStuff
- On deserialization, see if an object was returned by the BafOStuff, this is a fallback case, the @deserializer attribute is read and called.
- For security reasons (what if someone is able to rewrite the serialized content) we could sign the cached object with MW secret key. Not sure.
See also:
T161647: RFC: Deprecate using php serialization inside MediaWiki