This is a regression and ten-folded number of duplicate parses: https://logstash.wikimedia.org/goto/1c53fae195f0a867986a7a01ddcb1dac
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T255502 Goal: Save Timing median back under 1 second | |||
Resolved | Krinkle | T277788 Save Timing improvements (2021-2022) | |||
Resolved | Ladsgroup | T292300 Eliminate unnecessary duplicate parses (2021-2022) | |||
Resolved | Ladsgroup | T301310 CommonsMetadata extension is triggering a duplicate parse in commons |
Event Timeline
The extension itself didn't have any change in the past month, possibly it's triggered by a change in core.
- The examples I could find are all from SkinMinerva not Vector or other skins. Is this right?
- Is this "the same request is now duplicating the parse" or "there are many more requests", e.g. could this just be a cache invalidation semi-stampede?
Nice catch, parsercache entries of mobile views are fragmented using responsiveimages key (don't ask why) and if you make the search minus it with -"responsiveimages", there is only a few from commons.
- Is this "the same request is now duplicating the parse" or "there are many more requests", e.g. could this just be a cache invalidation semi-stampede?
First for sure, but possibly the latter as well. i.e. it might have been duplicate parses all the time but due to some cache invalidation (which should not be done like this anyway) we are seeing a flood in logs but the first part is for sure.
See also:
- (Oct 2021) T292302: CommonsMetadata extension causes every page on commons to be always parsed twice.
- (Feb 2021) T245162 CommonsMetadata fatal error only on Minerva pageviews.
Change 761993 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@master] ParserOutputAccess: Cache Parsing inside the class as well
@tstarling We are having trouble understanding the root cause of this, see discussion on the patch. It may be related to PoolCounter, could you have a look?
So I just write brain dump of four hours of work and pulling my hair:
How to reproduce? Go to any image in commons, Disable FormatMetadata cache (in /srv/mediawiki/php-1.38.0-wmf.21/includes/media/FormatMetadata.php, the cache with getExtendedMetadata key). And request a purge
Notes I've seen:
- There is nothing in configuration of PoolCounter in mediawiki, the class returned is MediaWiki\Extension\PoolCounter\Client. While I see for both PoolCounter work objects, we get two different PoolCounter objects so it can't use its own class cache (I tried to fix it in PoolCounter::factory() by adding a static per-type per-key object holder, it started to be in the same object and yet double parsed it)
- Both of them get to PoolWorkArticleViewCurrent and not different keys. The keys are the same.
- PoolCounter considers both of them cacheable() and enters the $this->isFastStaleEnabled() section in PoolCounterWork::execute().
- The value of isFastStaleEnabled is true in both cases which made me think maybe the only recent change in poolcounter directory in mediawiki caused it: https://github.com/wikimedia/mediawiki/commit/ca71e69fc6fc4b6a90a961136c2c4b4c5e0ac792#diff-493b98036078af4c901fd431b8c6fd5cdf7720f512f5ffce47d0dd032fb5d7e1 which re-throws the timeout error
- But reverting it didn't fix the issue (maybe I need to double test?) Or maybe we need to remove another re-throw?
- In both cases PoolCounter responds with LOCKED (value = 1), letting the double parse to go ahead.
- The other problem is that even if PoolCounter responds with DONE (=cached) instead of LOCKED (lock acquired). PoolWorkArticleViewCurrent::getCachedWork() simply tries to read ParserCache and fails because it's not saved there yet.
That's all I've got.
Change 762914 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@wmf/1.38.0-wmf.21] ParserOutputAccess: Cache Parsing inside the class as well
Change 762915 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@wmf/1.38.0-wmf.22] ParserOutputAccess: Cache Parsing inside the class as well
Change 761993 merged by jenkins-bot:
[mediawiki/core@master] ParserOutputAccess: Add process cache within the service class
Change 762915 merged by jenkins-bot:
[mediawiki/core@wmf/1.38.0-wmf.22] ParserOutputAccess: Cache Parsing inside the class as well
Change 762914 merged by jenkins-bot:
[mediawiki/core@wmf/1.38.0-wmf.21] ParserOutputAccess: Cache Parsing inside the class as well
Mentioned in SAL (#wikimedia-operations) [2022-02-16T16:06:57Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.22/includes/page/ParserOutputAccess.php: Backport: [[gerrit:762915|ParserOutputAccess: Cache Parsing inside the class as well (T301310)]] (duration: 00m 54s)
Mentioned in SAL (#wikimedia-operations) [2022-02-16T16:10:31Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.21/includes/page/ParserOutputAccess.php: Backport: [[gerrit:762914|ParserOutputAccess: Cache Parsing inside the class as well (T301310)]] (duration: 00m 52s)
This is back to the previous rate. No increase in memory error (or other types of errors) has been observed.
DBA's work on this as a fix to avoid overloading the database is done. I leave it open for finding the root cause.
On a second thought, it seems it's a different issue. I filed T302008: Improve in-process caching of PoolWorkArticleViewCurrent and close this instead to reflect the reality that duplicate parses are gone now.