Page MenuHomePhabricator

Parser cache serving old results
Closed, ResolvedPublicBUG REPORT

Description

With 1.36.0-wmf.20 the parser cache serves old results.

Step to Reproduce:

Load https://en.wikipedia.org/w/index.php?title=User:JJMC89_bot/report/NFCC_violations&diff=992089666&oldid=991325630&diffmode=source&safemode=1

Actual Results:

Saved in parser cache with key enwiki:pcache:idhash:57206175-0!thumbsize=7!tmh-videojs and timestamp 20201129194700 and revision id 991325630. Serialized with PHP.

Expected Results:

Saved in parser cache with key enwiki:pcache:idhash:57206175-0!canonical and timestamp 20201203130039 and revision id 992089666. Serialized with JSON.

Event Timeline

JJMC89 triaged this task as Unbreak Now! priority.Dec 3 2020, 9:23 PM

Can you try just looking at the regular read view of this page? While you're logged in.

I see the old revision instead of the current one.

To reproduce, you need to be logged in and you need to have a thumbnail size preference set to 400px - in that case you will hit the faulty parser cache entry.

On the regular view I do see the same issue:

<!-- Saved in parser cache with key enwiki:pcache:idhash:57206175-0!thumbsize=7!tmh-videojs and timestamp 20201129194700 and revision id 991325630. Serialized with PHP.
-->

null edit didn't resolve the problem.

I have reproduced this twice in WIkidata. Purge fixes it.

Mentioned in SAL (#wikimedia-operations) [2020-12-03T21:39:33Z] <twentyafterfour> rolling back wmf.20 due to T269396 refs T263186

Change 645178 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/mediawiki-config@master] All wikis (except testwikis) to 1.36.0-wmf.18 refs T269396 and T263186

https://gerrit.wikimedia.org/r/645178

Metadata values currently in cache for this page: "{"ParseUsedOptions":{"thumbsize":true,"stubthreshold":true},"CacheExpiry":2592000,"CacheTime":"20201203213525","CacheRevisionId":992089666,"Version":"1.6.4","_type_":"CacheTime"}"

So the options are correct..

As for the ParserOutput stored under this key, we have

["mParseUsedOptions":protected]=>
  array(2) {
    ["thumbsize"]=>
    bool(true)
    ["stubthreshold"]=>
    bool(true)
  }
  ["mVersion":"CacheTime":private]=>
  string(5) "1.6.4"
  ["mCacheTime":"CacheTime":private]=>
  string(0) ""
  ["mCacheExpiry":"CacheTime":private]=>
  NULL
  ["mCacheRevisionId":"CacheTime":private]=>
  NULL
  ["mUsedOptions"]=>
  NULL
  ["mVersion"]=>
  string(5) "1.6.4"
  ["mCacheTime"]=>
  string(14) "20201129194700"
  ["mCacheExpiry"]=>
  NULL
  ["mCacheRevisionId"]=>
  int(991325630)

Change 645178 merged by jenkins-bot:
[operations/mediawiki-config@master] All wikis (except testwikis) to 1.36.0-wmf.18 refs T269396 and T263186

https://gerrit.wikimedia.org/r/645178

Rolled back to testwikis only for the time being. @JJMC89 reports that the issue is not present in wmf.18.

So the stored ParserOutput entry here is outdated even according to it's own mCacheTime vs page touched. Eyeballing the code, this can happen if we are calling ParserCache::getDirty which is a fallback for PoolWorkArticleViewCurrent.

So maybe the issue is one level above the ParserCache and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/640930 is at fault here. Investigation continues.

One more small piece of data - as we've hit all wikis we also have started emitting a new metric and we did see ~2500 stale cache entries returned per second, indicated by MediaWiki.ParserOutputAccess.Status.ok.count metric. I do not have anything to compare this to, but this seems like too many - we should be returning stale parses only if we are unable to do a parse, which shouldn't really be happening that often..

I was able to reproduce this locally by the following procedure:

  • User1 and User2 having different parser options
  • Check out MW 021a1f82d2. It was a guessed hash, not the exact right one, but old enough to show the issue.
  • Set $wgParserCacheUseJson = false
  • User2 views the page
  • User1 edits the page. Thus the situation is as in the task description with the cache entry for User2 being an expired cache entry from an old version of MW with PHP serialization.
  • Check out master
  • User2 views the page. The old revision is seen.

This is apparently because of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/640281 . Changing a property from public to private changes its name in the PHP serialization.

> $entry = $pcc->get('mw:pcache:idhash:24-0!thumbsize=3');
> var_dump($entry);
object(ParserOutput)#294 (54) {
...
  ["mCacheTime":"CacheTime":private]=>
  string(0) ""
...
  ["mCacheTime"]=>
  string(14) "20201204044016"

That is, var_dump() shows the object as having both a public and private property of that name. CacheTime::getCacheTime() reads from the private property and lazy-initialises it to the current timestamp, so ParserOutput::expired() always returns false.

Similarly the value of mCacheRevisionId from the serialized blob is ignored (treated as null), so isDifferentRevision() always returns false.

So I guess the solution is to revert it for a third time. Judging by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/631240/ it was already reverted twice.

See https://www.php.net/manual/en/function.serialize.php: "Note: Object's private members have the class name prepended to the member name; protected members have a '*' prepended to the member name. These prepended values have null bytes on either side."

It's necessary to add the class name to the serialized key because private members have private scope -- a subclass can have its own private property with the same name, which will be an independent variable.

Just for the context, the serialize / unserialize with different class signatures has been crippling for a while:

  • T156364#2978796 from 2017 showing a $a property being both public and protected, thought that might have been a HHVM only issue.
  • T156541 (declined) about BagOStuff and a proposal to add some signature
  • the very similar T203566#4566713 (it is a good read) which affected ParserOutput , caused by a new field introduced in the class
  • T229366#5378458 due to ParserOutput property mSpeculativeRevId being renamed to speculativeRevIdUsed (which got reverted)

As part as that last task we had a test added to ensure we can unserialize ParserOutput from a previous version ( https://gerrit.wikimedia.org/r/c/mediawiki/core/+/526575/2/tests/phpunit/includes/parser/ParserOutputTest.php )

We had several similar issue during the PHP 7.2 roll out when we had data serialized with either PHP 7.2 or HHMV which were slightly different. Notably when it comes to handle protected property iirc. But I can't find a task.

Anyway, I imagine moving from PHP serialize() to JSON is intended to address that. One idea was to namespace the cache key based on the class signature, which would also mean invalidating the parser cache whenever ParserOutput is changed, not ideal :-\

See https://www.php.net/manual/en/function.serialize.php: "Note: Object's private members have the class name prepended to the member name; protected members have a '*' prepended to the member name. These prepended values have null bytes on either side."

Crud - I was staring at this, but still didn't realize that changing the access modifier would break deserialization. I even wrote GhostFieldAccessTrait to access "hidden" unserialized fields, to be used when renaming or removing fields. We also created SerializationTestTrait to ensure we are not inadvertently breaking serialization compatibility when making changes. And then happily updated the expected values, not realizing that we'd of course no longer be reading existing entries correctly. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/640281. I should have spotted that during review. I guess the expected failures (changing output of serialization) made me overlook the failures reported for deserialization. My bad.

I guess the solution is to revert as you suggest, and wait for all cache entries to be json.

Anyway, I imagine moving from PHP serialize() to JSON is intended to address that. One idea was to namespace the cache key based on the class signature, which would also mean invalidating the parser cache whenever ParserOutput is changed, not ideal :-\

Indeed. Moving to JSON serialization will finally protect us from this failure mode.

Change 645157 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Revert "Hard-deprecate all public property access on CacheTime and ParserOutput."

https://gerrit.wikimedia.org/r/645157

Change 645312 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@wmf/1.36.0-wmf.20] Revert "Hard-deprecate all public property access on CacheTime and ParserOutput."

https://gerrit.wikimedia.org/r/645312

The revert above would unblock the train. I'll leave it to whoever runs the train to decide whether they want to do that on a friday.

Thanks! As far as I know, we will continue on Monday.

Change 646101 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] CacheTime: make properties protected and add more tests

https://gerrit.wikimedia.org/r/646101

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/646101 should fix the issue correctly, but at this point I'm inclined to say let's revert on master as well, wait until PHP-serialized data expires in production and then re-apply the original fix with a followup. Just to be on the very safe side.

Change 645157 merged by jenkins-bot:
[mediawiki/core@master] Revert "Hard-deprecate all public property access on CacheTime and ParserOutput."

https://gerrit.wikimedia.org/r/645157

Change 645312 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.20] Revert "Hard-deprecate all public property access on CacheTime and ParserOutput."

https://gerrit.wikimedia.org/r/645312

hashar lowered the priority of this task from Unbreak Now! to High.Dec 7 2020, 2:57 PM

The revert got backported to 1.36.0-wmf.20 and I have promoted all wikis to 1.36.0-wmf.20.

daniel claimed this task.