Page MenuHomePhabricator

Change ParserCache serialization format to JSON
Closed, ResolvedPublic

Description

Currently ParserCache stores PHP-serialized instances of ParserOutput objects. Since we don't control serialization/deserialization logic, we are very limited in changing the internal structure of the ParserOutput (or whatever else we will be storing in ParserCache) without bumping the VERSION and invalidating the cache. This will limit us during Parsoid integration work, but having the ParserCache store JSON will allow us to have custom code executed on serialization/deserialization and make modifications to the format of the stored entities without invalidating the whole cache.

Instead, we propose to change the serialization format into JSON. RESTBase has been storing Parsoid output in JSON for years, so we have at least some baseline idea that the data would fit the format.

The plan:

  • Anything stored in the ParserCache (ParserOutput at this point, more things later on) will be required to implement JsonSerializable
  • Current idea is to have ParserCache serialize the value and enhance the output with type metadata, like "@deserializer": "ParserOutput::createFromSerializedData" and pass it into the underlying BagOfStuff. Alternatively, we could implement a JsonSerializingBagOfStuff
  • On deserialization, see if an object was returned by the BafOStuff, this is a fallback case, the @deserializer attribute is read and called.
  • For security reasons (what if someone is able to rewrite the serialized content) we could sign the cached object with MW secret key. Not sure.

See also:
T161647: RFC: Deprecate using php serialization inside MediaWiki

Related Objects

Event Timeline

Change 631490 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] ParserCache: be resilient to string values

https://gerrit.wikimedia.org/r/631490

Change 631490 merged by jenkins-bot:
[mediawiki/core@master] ParserCache: be resilient to string values

https://gerrit.wikimedia.org/r/631490

Change 630927 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] WIP: use JSON for parser cache

https://gerrit.wikimedia.org/r/630927

Change 633513 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] ParserCache: introduce feature flag for enabling JSON encoding.

https://gerrit.wikimedia.org/r/633513

Change 630927 merged by jenkins-bot:
[mediawiki/core@master] Use JSON for parser cache

https://gerrit.wikimedia.org/r/630927

Change 633513 merged by jenkins-bot:
[mediawiki/core@master] ParserCache: introduce feature flag for enabling JSON encoding.

https://gerrit.wikimedia.org/r/633513

Change 634037 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Move serializability validation from ParserOutput to ParserCache

https://gerrit.wikimedia.org/r/634037

Change 634037 merged by jenkins-bot:
[mediawiki/core@master] Move serializability validation from ParserOutput to ParserCache

https://gerrit.wikimedia.org/r/634037

Change 635071 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Enable warn+ logging for ParserCache channel

https://gerrit.wikimedia.org/r/635071

Change 635359 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Bata: enable ParserCache JSON serialization

https://gerrit.wikimedia.org/r/635359

Change 635359 merged by jenkins-bot:
[operations/mediawiki-config@master] Beta: enable ParserCache JSON serialization

https://gerrit.wikimedia.org/r/635359

Change 635376 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] ParserCache: add serialization format to HTML debug message.

https://gerrit.wikimedia.org/r/635376

Change 635382 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Enable ParserCache JSON serialization on testwiki

https://gerrit.wikimedia.org/r/635382

Change 635376 merged by jenkins-bot:
[mediawiki/core@master] ParserCache: add serialization format to HTML debug message.

https://gerrit.wikimedia.org/r/635376

Change 635071 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable warn+ logging for ParserCache channel

https://gerrit.wikimedia.org/r/635071

Change 635382 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable ParserCache JSON serialization on testwiki

https://gerrit.wikimedia.org/r/635382

I have enabled JSON serialization on testwiki with no issues. Moving on to group0

Change 635607 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Switch ParserCache to JSON for group0 wikis

https://gerrit.wikimedia.org/r/635607

Change 635909 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] ParserCache JSON - do not \u encode unicode and special characters.

https://gerrit.wikimedia.org/r/635909

Change 635909 merged by jenkins-bot:
[mediawiki/core@master] ParserCache JSON - do not \u encode unicode and special characters.

https://gerrit.wikimedia.org/r/635909

Change 635607 merged by jenkins-bot:
[operations/mediawiki-config@master] Switch ParserCache to JSON for group0 wikis

https://gerrit.wikimedia.org/r/635607

Plan:
11/23/2020 - group1
11/30/2020 - all wikis

Change 643064 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] group1: switch ParserCache to JSON

https://gerrit.wikimedia.org/r/643064

Change 643064 merged by jenkins-bot:
[operations/mediawiki-config@master] group1: switch ParserCache to JSON

https://gerrit.wikimedia.org/r/643064

Mentioned in SAL (#wikimedia-operations) [2020-11-23T19:21:50Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: a110db09adf95edb38f663c19ce596e817ecf55d: group1: switch ParserCache to JSON (T263579) (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2020-11-23T19:24:14Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Revert a110db09adf95edb38f663c19ce596e817ecf55d: group1: switch ParserCache to JSON (T263579) (duration: 00m 42s)

Change 643260 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] group1: Switch ParserCache to JSON

https://gerrit.wikimedia.org/r/643260

Change 643260 merged by jenkins-bot:
[operations/mediawiki-config@master] group1: Switch ParserCache to JSON

https://gerrit.wikimedia.org/r/643260

Change 644243 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] group2: switch ParserCache to JSON

https://gerrit.wikimedia.org/r/644243

Change 644243 merged by jenkins-bot:
[operations/mediawiki-config@master] group2: switch ParserCache to JSON

https://gerrit.wikimedia.org/r/644243

Json serialization was deployed to all wikis. We should probably add a release note about it and flip the $wgParserCacheUseJson setting to true.

Change 644316 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Flip $wgParserCacheUseJson default

https://gerrit.wikimedia.org/r/644316

Change 644317 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Remove wgParserCacheUseJson setting

https://gerrit.wikimedia.org/r/644317

Change 644316 merged by jenkins-bot:
[mediawiki/core@master] Flip $wgParserCacheUseJson default

https://gerrit.wikimedia.org/r/644316

Change 644317 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove wgParserCacheUseJson setting

https://gerrit.wikimedia.org/r/644317