Page MenuHomePhabricator

Put Parsoid output into the ParserCache on every edit
Closed, ResolvedPublic

Description

In order to allow VisualEditor to use Parsoid in MediaWiki core directly, we need Parsoid output to pre pre-generated and stored in the ParserCache. "on every edit" in the task description is an attempt at being concise, see T320534#8627356 for a more detailed explanation of when this parsing/storage will happen.

Relevant config for probabilistic pre-warming trigegred by requests coming from RESTbase to the Parsoid endpoints:

wgTemporaryParsoidHandlerParserCacheWriteRatio = '1.0';

This is preferred during roll-out, since parsing will happen on the dedicated parsoid cluster.

Relevant configuration for native pre-warming (blocked on using the JobQueue):

wgParsoidCacheConfig = [
	'StashType' => null, // see T320536
	'StashDuration' => 24 * 60 * 60, // after one day, edits may fail
	'CacheThresholdTime' => 0.0, // 0 means cache all
	'WarmParsoidParserCache' => true, // enable cache warming
]

Parsoid will use the backend configured in $wgParserCacheType, with the key prefix "parsoid". If we need to be able to configure the backend separately, we will need to introduce a new config setting for it and pass it into ParserCacheFactory.

NOTE: This may double the storage capacity needed for ParserCache. Though parsoid output is less fragmented, so it may not be quite that much.
NOTE: Keep DBAs posten on where this is being enabled where, it has the potential to overload the ParserCache database servers! See the 'parsercache-dbs' key in ProductionServices.php.

Related Objects

StatusSubtypeAssignedTask
StalledNone
In ProgressNone
OpenNone
OpenNone
In ProgressNone
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
ResolvedNone
Resolvedovasileva
Declined nray
ResolvedDAlangi_WMF
Resolvedssastry
Resolveddaniel
Resolveddaniel
ResolvedClement_Goubert
Resolveddaniel
ResolvedKrinkle

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 865070 merged by jenkins-bot:

[operations/mediawiki-config@master] hewiki: enable parser cache writes for parsoid's page/html endpoint.

https://gerrit.wikimedia.org/r/865070

Mentioned in SAL (#wikimedia-operations) [2022-12-07T21:08:12Z] <samtar@deploy1002> Started scap: Backport for [[gerrit:865070|hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529)]], [[gerrit:865071|Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)]]

Mentioned in SAL (#wikimedia-operations) [2022-12-07T21:10:05Z] <samtar@deploy1002> samtar and daniel: Backport for [[gerrit:865070|hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529)]], [[gerrit:865071|Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-12-07T21:28:47Z] <samtar@deploy1002> Finished scap: Backport for [[gerrit:865070|hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529)]], [[gerrit:865071|Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)]] (duration: 20m 35s)

Change 868127 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Increase PC writes from parsoid API to 10%

https://gerrit.wikimedia.org/r/868127

Change 868127 merged by jenkins-bot:

[operations/mediawiki-config@master] Increase PC writes from parsoid API to 10%

https://gerrit.wikimedia.org/r/868127

Mentioned in SAL (#wikimedia-operations) [2023-01-24T14:09:50Z] <samtar@deploy1002> Started scap: Backport for [[gerrit:868127|Increase PC writes from parsoid API to 10% (T320534)]]

Mentioned in SAL (#wikimedia-operations) [2023-01-24T14:11:37Z] <samtar@deploy1002> daniel and samtar: Backport for [[gerrit:868127|Increase PC writes from parsoid API to 10% (T320534)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-01-24T14:17:32Z] <samtar@deploy1002> Finished scap: Backport for [[gerrit:868127|Increase PC writes from parsoid API to 10% (T320534)]] (duration: 07m 41s)

Change 885337 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Bump parsoid parser cache writes to 25%.

https://gerrit.wikimedia.org/r/885337

Change 885337 merged by jenkins-bot:

[operations/mediawiki-config@master] Bump parsoid parser cache writes to 25%.

https://gerrit.wikimedia.org/r/885337

Mentioned in SAL (#wikimedia-operations) [2023-01-31T14:03:36Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]]

Mentioned in SAL (#wikimedia-operations) [2023-01-31T14:05:26Z] <urbanecm@deploy1002> urbanecm and dreamyjazz and daniel: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde

Mentioned in SAL (#wikimedia-operations) [2023-01-31T14:20:09Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] (duration: 16m 33s)

Mentioned in SAL (#wikimedia-operations) [2023-01-31T14:26:56Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]]

Mentioned in SAL (#wikimedia-operations) [2023-01-31T14:28:43Z] <urbanecm@deploy1002> dreamyjazz and urbanecm and daniel: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwde

Mentioned in SAL (#wikimedia-operations) [2023-01-31T14:34:19Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] (duration: 07m 23s)

Change 886905 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Bump parsoid parser cache writes to 50%.

https://gerrit.wikimedia.org/r/886905

@daniel the task description says "on every edit", but from my quick skim of the code this is actually going to happen whenever the main parser cache needs to be updated, which could be from a LinksUpdate (e.g. template change) or purge. Is that correct?

@daniel the task description says "on every edit", but from my quick skim of the code this is actually going to happen whenever the main parser cache needs to be updated, which could be from a LinksUpdate (e.g. template change) or purge. Is that correct?

Yes and no - template changes trigger a RefreshLinksJob which (as I recently discovered to my surprise) do not trigger a re-parse, they just invalidate. The re-parse happens the next time someone views the page. At that time, we also trigger a parsoid re-parse.

So, it's "on every edit and also on page views when we think that the cache entry is probably stale, typically becaue it has been invalidated due to a template chage". I just didn't find a way to put that concisely.

@daniel RefreshLinks does a reparse. It only uses the result to update link tables, page props, etc. The discovery, if I recall correctly, was that it doesn't save to ParserCache.

The job that invalidates parser cache is HTMLCacheUpdate. That's a similarly recursive job running alongside it (usually much faster) that purges CDN and bumps page_touched, which means ParserCache will considers it miss/stale indeed.

Thanks for explaining - I should've just led with my real question sorry, which is whether this will change how Linter updates will work (see related T159512#8627272). Lint errors are effectively derived/secondary data from the Parsoid parsing process, so in theory it should happen as part of RefreshLinks. That would require us to do a full Parsoid parse, just like, AIUI, we do with the legacy parser right now. I assume that eventually we will be extracting links tables, etc. from Parsoid, so at some point we'll need to do a Parsoid parse in RefreshLinks regardless.

@daniel RefreshLinks does a reparse. It only uses the result to update link tables, page props, etc. The discovery, if I recall correctly, was that it doesn't save to ParserCache.

Oh, right, sorry.

Lint errors are effectively derived/secondary data from the Parsoid parsing process, so in theory it should happen as part of RefreshLinks.

I see... this is very expensive, though.... Will template changes even affect linter errors? And should they? They indicate whetehr there is something wron with the page's wikitext. That doesn't change as long as the page itself isn't edited, right?

Please file a ticket if you think this is needed.

Change 886905 merged by jenkins-bot:

[operations/mediawiki-config@master] Bump parsoid parser cache writes to 50%.

https://gerrit.wikimedia.org/r/886905

Mentioned in SAL (#wikimedia-operations) [2023-03-09T14:05:51Z] <samtar@deploy2002> Started scap: Backport for [[gerrit:886905|Bump parsoid parser cache writes to 50%. (T320534)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-09T14:07:33Z] <samtar@deploy2002> daniel and samtar: Backport for [[gerrit:886905|Bump parsoid parser cache writes to 50%. (T320534)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-09T14:13:20Z] <samtar@deploy2002> Finished scap: Backport for [[gerrit:886905|Bump parsoid parser cache writes to 50%. (T320534)]] (duration: 07m 28s)

Change 898795 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Always write parsoid output to parser cache.

https://gerrit.wikimedia.org/r/898795

Note to self:

  • $wgTemporaryParsoidHandlerParserCacheWriteRatio should go away. Instead we should set CacheThresholdTime to a high value on Commons and Wikidata.
  • CacheThresholdTime should probably be 1 per default, not 0.
  • We want to enable WarmParsoidParserCache everyhwere, but we may need custom handling in ChangeProp.

Change 898795 merged by jenkins-bot:

[operations/mediawiki-config@master] Always write parsoid output to parser cache.

https://gerrit.wikimedia.org/r/898795

Mentioned in SAL (#wikimedia-operations) [2023-03-15T14:14:18Z] <daniel@deploy2002> Started scap: Backport for [[gerrit:898795|Always write parsoid output to parser cache. (T320534)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-15T14:15:50Z] <daniel@deploy2002> daniel: Backport for [[gerrit:898795|Always write parsoid output to parser cache. (T320534)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-15T14:24:15Z] <daniel@deploy2002> Finished scap: Backport for [[gerrit:898795|Always write parsoid output to parser cache. (T320534)]] (duration: 09m 57s)

Change 928063 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/deployment-charts@master] changeprop-jobqueue: Give parsoidCachePrewarm its own lane

https://gerrit.wikimedia.org/r/928063

Change 928063 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop-jobqueue: Give parsoidCachePrewarm its own lane

https://gerrit.wikimedia.org/r/928063

Change 928069 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for prewarmparsoid to 45

https://gerrit.wikimedia.org/r/928069

Change 928069 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for prewarmparsoid to 45

https://gerrit.wikimedia.org/r/928069

Mentioned in SAL (#wikimedia-operations) [2023-06-07T15:21:02Z] <claime> Bumping prewarmparsoid concurrency to 45 in changeprop-jobqueue - T320534

Change 928120 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for prewarmparsoid to 60

https://gerrit.wikimedia.org/r/928120

Change 928120 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for prewarmparsoid to 60

https://gerrit.wikimedia.org/r/928120