Page MenuHomePhabricator

RESTbase: Turn off pre-generation and caching for parsoid endpoints
Open, Needs TriagePublic

Description

Once the primary remaining consumer of the parsoid endpoints in RESTbase (namely PCS) has been switched to using MW core endpoints, we can and should turn off pre-generation and caching for the parsoid endpoints in RESTbase.

NOTE: we could do this even while T339865 hasn't been implemented: since the parsoid endpoints in core are backed by the parser cache, just proxying their response should be fast enough.
NOTE: We probably still need purge events for the edge caches. Or can we just use a low fixed TTL? Public callers of the API should migrate to the core endpoints anyway.
NOTE: Turning off pre-generation for parsoid in RESTbase means we are no longer doign any parsing (and caching) on the MediaWiki parsoid cluster either. This means that all of the parsing will need to happen on the jobrunner cluster. To test out how much load this will generate, we can turn off cache writes in the MediaWiki parsoid endpoints by tweaking the TemporaryParsoidHandlerParserCacheWriteRatio setting.

Event Timeline

daniel added a subscriber: Eevans.
daniel updated the task description. (Show Details)

Change 932175 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] enwiki: Disable PC writes in parsoid endpoints

https://gerrit.wikimedia.org/r/932175

MSantos edited projects, added Parsoid (Tracking); removed Parsoid.

Change 932175 merged by jenkins-bot:

[operations/mediawiki-config@master] Parsoid: Disable PC writes on frwiki

https://gerrit.wikimedia.org/r/932175

Mentioned in SAL (#wikimedia-operations) [2023-06-26T13:15:57Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:932175|Parsoid: Disable PC writes on frwiki (T339867)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-26T13:17:23Z] <daniel@deploy1002> daniel: Backport for [[gerrit:932175|Parsoid: Disable PC writes on frwiki (T339867)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-26T13:26:18Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:932175|Parsoid: Disable PC writes on frwiki (T339867)]] (duration: 10m 20s)

Change 933117 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for prewarmparsoid to 100

https://gerrit.wikimedia.org/r/933117

Change 933117 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for parsoidCachePrewarm to 100

https://gerrit.wikimedia.org/r/933117

Change 933184 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Parsoid: Disable PC writes on dewiki

https://gerrit.wikimedia.org/r/933184

Change 933184 merged by jenkins-bot:

[operations/mediawiki-config@master] Parsoid: Disable PC writes on dewiki

https://gerrit.wikimedia.org/r/933184

Mentioned in SAL (#wikimedia-operations) [2023-06-27T11:12:41Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:933184|Parsoid: Disable PC writes on dewiki (T339867)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-27T11:14:09Z] <daniel@deploy1002> daniel: Backport for [[gerrit:933184|Parsoid: Disable PC writes on dewiki (T339867)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-27T11:21:16Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:933184|Parsoid: Disable PC writes on dewiki (T339867)]] (duration: 08m 34s)

Change 933437 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Parsoid: Disable PC writes on enwiki

https://gerrit.wikimedia.org/r/933437

Change 933437 merged by jenkins-bot:

[operations/mediawiki-config@master] Parsoid: Disable PC writes on enwiki

https://gerrit.wikimedia.org/r/933437

Mentioned in SAL (#wikimedia-operations) [2023-06-27T11:43:29Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:933437|Parsoid: Disable PC writes on enwiki (T339867)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-27T11:44:55Z] <daniel@deploy1002> daniel: Backport for [[gerrit:933437|Parsoid: Disable PC writes on enwiki (T339867)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-27T11:55:35Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:933437|Parsoid: Disable PC writes on enwiki (T339867)]] (duration: 12m 06s)

Change 933453 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Disable PC writes for parsoid endpoints

https://gerrit.wikimedia.org/r/933453

Last night we ran into another backlog increase

image.png (500×1 px, 44 KB)

Also hitting concurrency limit again (not linking the misleading graph, but we did), coupled with a good dip below 50% idle workers on jobrunners
image.png (500×1 px, 127 KB)

CPU load on jobrunners wasn't that affected
image.png (747×3 px, 528 KB)

@jijiki @akosiaris Do you think we may need to add another few servers to the jobrunners cluster? I'm afraid we'll run into real worker saturation if we up the concurrency even more.

I 'd say we can move 2 or 3 servers from the API cluster (it has 62) and into the jobrunner cluster. With the larger user of the API, that is parsoid nodejs, now gone for quite some time, we apparently have the space.

Change 933453 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable PC writes for parsoid endpoints

https://gerrit.wikimedia.org/r/933453

Mentioned in SAL (#wikimedia-operations) [2023-06-29T13:28:02Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:933453|Disable PC writes for parsoid endpoints (T339867)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-29T13:29:32Z] <daniel@deploy1002> daniel: Backport for [[gerrit:933453|Disable PC writes for parsoid endpoints (T339867)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-29T13:35:09Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:933453|Disable PC writes for parsoid endpoints (T339867)]] (duration: 07m 07s)

Change 935152 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for parsoidCachePrewarm to 150

https://gerrit.wikimedia.org/r/935152

Change 935152 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop-jobqueue: Bump the concurrency for parsoidCachePrewarm to 150

https://gerrit.wikimedia.org/r/935152

Change 941946 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Re-enable PC writes for parsoid endpoints

https://gerrit.wikimedia.org/r/941946

The experiment of disabling parser cache writes in the parsoid endpoints was successfull: the jobrunner cluster can handle the load of doing all the parsoid re-parses on page edits without the help of the parsoid cluster. To achieve this, allowed concurrency for the parsoidCachePrewarm job had to be increased significantly.

We now know that, when we disable pre-generation of parsoid content in RESTbase, this will not overload the jobrunner cluster.

Since the experiment is complete, we can now re-enable parsoid cache writes for the parsoid endpoints in MediaWiki.

Change 941946 merged by jenkins-bot:

[operations/mediawiki-config@master] Re-enable PC writes for parsoid endpoints

https://gerrit.wikimedia.org/r/941946

Mentioned in SAL (#wikimedia-operations) [2023-07-27T13:04:33Z] <samtar@deploy1002> Started scap: Backport for [[gerrit:941946|Re-enable PC writes for parsoid endpoints (T339867)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-27T13:05:57Z] <samtar@deploy1002> samtar and daniel: Backport for [[gerrit:941946|Re-enable PC writes for parsoid endpoints (T339867)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-07-27T13:11:35Z] <samtar@deploy1002> Finished scap: Backport for [[gerrit:941946|Re-enable PC writes for parsoid endpoints (T339867)]] (duration: 07m 02s)

As part of T349796: Move MediaWiki jobs to mw-on-k8s, @hnowlan will be moving parsoidCachePrewarm to mw-on-k8s jobrunners. We'll keep an eye on backlog and execution time. It may be worth repeating the experiment of disabling PC writes to see if we encounter unexpected capacity issues.