Page MenuHomePhabricator

Retire purge-parsercache periodic jobs
Closed, ResolvedPublic

Description

We are now in a much better position with regards to ParserCache and I think it doesn't make sense anymore to do the purges via maint scripts, this makes maintenance of sections also harder (T398527: MediaWiki periodic job purge-parsercache-pc4 failed). Let's just change it to drop ten rows 1/10th of the time.

Background

The default behavior in SqlBagOStuff is to delete expired rows from the cache, after 1 in every N web requests where something wrote to that same cache (via a DeferredUpdate callback). This feature has been disabled at WMF since at least 2012 for the ParserCache, in favor of running the purgeParserCache.php maintenance script from a daily cronjob instead.

In 2014-2015, we added post-send support (for php-fpm, and for HHVM).

In 2019, the SqlBagOStuff purging was improved by Aaron to sample writes instead of reads, and to limit the query size instead of deleting an unlimited number of expires rows (change 520968).

In 2022, as part of Multi-DC (T212129), the MainStash was switched to SqlBagOStuff with this feature enabled.

If the added overhead of this is managable, it will make DBA maintenance easier, and allow us to potentially remove the purgeParserCache.php script and the purgePeriod=0 configuration (i.e. support only this way, since we'd have proven it to naturally to both scale down well for default installs, and handle scaling up for WMF).

Event Timeline

Ladsgroup triaged this task as Medium priority.Jul 8 2025, 12:27 PM
Ladsgroup moved this task from Triage to In progress on the DBA board.

Change #1167217 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Set purge values for parsercache

https://gerrit.wikimedia.org/r/1167217

If there is no objections, I'll deploy this once I'm back from vacation (July 28th).

[…] This script should become a set of jobs.

The above config patch enables DeferredUpdates instead of Jobs. I know you know etc, just stating here for the record in case anyone gets confused.

Let's just change it to drop ten rows 1/10th of the time.

Looks like you went for 20 rows on every 10th writes. LGTM.


The default purge mechanism in BagOStuff for MediaWiki uses post-send DeferredUpdates on web requests (not jobs), and this patch enables that. It is the same mechanism we use in production for the MainStash DB as well. That makes sense to me to try here as well.

Do you want to turn off the periodic job before hand, or monitor them side-by-side for some time?

If we run both, the main thing we'd be testing is that the mechanism doesn't hard error. That could be a safe first step but, the in-process BagOstuff purger is already used in production, so we know that part generally works fine. I think it'd be easiest to analyze impact if we turn off the period jobs first, assuming we have enough margin to skip a few days. Then we cleanly compare the trend lines before/after.

I trust you'll monitor the DB load from the MySQL side, and how it impacts ParserCache storage (i.e. is it faster or slower in reclaiming space than before? If slower, is it fast enough to keep up? May need some tweaking in the numbers).

From the MW side of things we may want to look at:

  • Grafana: ParserCache hit-ratio (should be no-op since we ignore expired data),
  • Arc Lamp: post-send flame graphs (should see a slight increase, but hopefully fine),
  • Grafana: Utilization of MW appservers where we write to ParserCache (edits on mw-web and mw-api-ext, and jobs on mw-jobrunner) which should be fine given most requests are reads not writes,
  • Logstash: Timeouts on POST requests from MediaWiki, i.e. during (or after) the purge takes place in post-send deferred updates.
  • Logstash: DBPerformance warnings, if wrongly triggered on GET requests.

Some background:

The default behavior in SqlBagOStuff is to delete expired rows from the cache, after 1 in every N web requests where something wrote to that same cache (via a DeferredUpdate callback). This feature has been disabled at WMF since at least 2012 for the ParserCache, in favor of running the purgeParserCache.php maintenance script from a daily cronjob instead.

In 2014-2015, we added post-send support (for php-fpm, and for HHVM).

In 2019, the SqlBagOStuff purging was improved by Aaron to sample writes instead of reads, and to limit the query size instead of deleting an unlimited number of expires rows (change 520968).

In 2022, as part of Multi-DC (T212129), the MainStash was switched to SqlBagOStuff with this feature enabled.

If the added overhead of this is managable, it will make DBA maintenance easier, and allow us to potentially remove the purgeParserCache.php script and the purgePeriod=0 configuration (i.e. support only this way, since we'd have proven it to naturally to both scale down well for default installs, and handle scaling up for WMF).

Krinkle updated the task description. (Show Details)

Thank you for the detailed context! I try to deploy the change now and make sure things are fine.

Change #1167217 merged by jenkins-bot:

[operations/mediawiki-config@master] ParserCache: Enable purgePeriod for SqlBagOStuff

https://gerrit.wikimedia.org/r/1167217

Mentioned in SAL (#wikimedia-operations) [2025-07-28T11:24:10Z] <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1167217|ParserCache: Enable purgePeriod for SqlBagOStuff (T398806)]]

Mentioned in SAL (#wikimedia-operations) [2025-07-28T11:30:36Z] <ladsgroup@deploy1003> ladsgroup: Backport for [[gerrit:1167217|ParserCache: Enable purgePeriod for SqlBagOStuff (T398806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-07-28T11:44:02Z] <ladsgroup@deploy1003> Finished scap sync-world: Backport for [[gerrit:1167217|ParserCache: Enable purgePeriod for SqlBagOStuff (T398806)]] (duration: 19m 51s)

Change #1173364 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] objectcache: Only clean a subset of tables in SqlBagOStuff

https://gerrit.wikimedia.org/r/1173364

Change #1173371 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.45.0-wmf.11] objectcache: Only clean a subset of tables in SqlBagOStuff

https://gerrit.wikimedia.org/r/1173371

Change #1173364 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Only clean a subset of tables in SqlBagOStuff

https://gerrit.wikimedia.org/r/1173364

Change #1173371 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.11] objectcache: Only clean a subset of tables in SqlBagOStuff

https://gerrit.wikimedia.org/r/1173371

Mentioned in SAL (#wikimedia-operations) [2025-07-28T14:44:52Z] <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1173371|objectcache: Only clean a subset of tables in SqlBagOStuff (T398806)]]

Mentioned in SAL (#wikimedia-operations) [2025-07-28T14:48:58Z] <ladsgroup@deploy1003> ladsgroup: Backport for [[gerrit:1173371|objectcache: Only clean a subset of tables in SqlBagOStuff (T398806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-07-28T14:57:22Z] <ladsgroup@deploy1003> Finished scap sync-world: Backport for [[gerrit:1173371|objectcache: Only clean a subset of tables in SqlBagOStuff (T398806)]] (duration: 12m 30s)

Change #1173917 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Reduce frequency of parsercache purge

https://gerrit.wikimedia.org/r/1173917

Change #1173917 merged by jenkins-bot:

[operations/mediawiki-config@master] Reduce frequency of parsercache purge

https://gerrit.wikimedia.org/r/1173917

Mentioned in SAL (#wikimedia-operations) [2025-07-29T10:35:38Z] <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1173917|Reduce frequency of parsercache purge (T398806)]]

Mentioned in SAL (#wikimedia-operations) [2025-07-29T10:37:51Z] <ladsgroup@deploy1003> ladsgroup: Backport for [[gerrit:1173917|Reduce frequency of parsercache purge (T398806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-07-29T10:46:02Z] <ladsgroup@deploy1003> Finished scap sync-world: Backport for [[gerrit:1173917|Reduce frequency of parsercache purge (T398806)]] (duration: 10m 26s)

Change #1173922 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mediawiki: Retire purge parser cahce maint scripts

https://gerrit.wikimedia.org/r/1173922

Change #1173922 merged by Ladsgroup:

[operations/puppet@production] mediawiki: Retire purge parser cahce maint scripts

https://gerrit.wikimedia.org/r/1173922

Change #1175165 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mediawiki: Completely remove the purge parser cache cron

https://gerrit.wikimedia.org/r/1175165

Change #1175165 merged by Ladsgroup:

[operations/puppet@production] mediawiki: Completely remove the purge parser cache cron

https://gerrit.wikimedia.org/r/1175165

Ladsgroup moved this task from In progress to Done on the DBA board.