Page MenuHomePhabricator

purgeParserCache too slow to run once a day
Open, Needs TriagePublic

Description

The mw-maintenance cscript 'parsercachepurging' (class mediawiki::maintenance::parsercachepurging) is scheduled to run once per day.

The full command line is:

0 1 * * * /usr/local/bin/mwscript purgeParserCache.php --wiki=aawiki --age=2592000 --msleep 500 >/dev/null 2>&1

Note how it has the msleep parameter and runs once per day.

When running it manually i noticed it is very slow. So slow that after a couple hours running it in screen i was still only at like 6%.

This says to me it's almost certainly not getting to 100% before the next run is already scheduled. Possibly then running twice.

Event Timeline

Dzahn created this task.Aug 9 2019, 9:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 9 2019, 9:00 PM

Cannot reproduce on Beta:

maurelio@deployment-deploy01:~$ mwscript purgeParserCache.php --wiki=aawiki --age=2592000 --msleep 500
Deleting objects expiring before 11:37, 10 August 2019

Cannot purge this kind of parser cache.

Same with foreachwiki.

Dzahn added a comment.Aug 10 2019, 8:33 PM

Cannot reproduce on Beta:

maurelio@deployment-deploy01:~$ mwscript purgeParserCache.php --wiki=aawiki --age=2592000 --msleep 500
Deleting objects expiring before 11:37, 10 August 2019
Cannot purge this kind of parser cache.

Same with foreachwiki.

I think that's because of the "only works if parsercache is a mysql db ". I guess in Beta that is not the case?

It is trying to purge objects created more than 2592000 seconds ago (30 days), with a sleep of 0,0005 seconds between each batch.

If it's a mysql in production, it will need to iterate all db servers and shards, fetch in blocks of 100 keynames, then issue deletes. I'm not surprised it's slow. The question is _how_ slow it really is. If more objects are stored daily than can be expired daily, then 30 days later it won't catch up.

It isn't really harmful, some entries non-expired entries that were stored more than 30 days ago could be used. And running the script twice shouldn't cause a problem either (each would be doing deletes, they could either be deleting the same rows or helping one another by deleting different rows, depending on how in-sync turned out to be the batches).

It is however, something that should not be happening. If we expect the entries to be force-purged every 30 days, it should be able to keep up, perhaps by running multiple workers.

As for the cron entries causing double-runs, I seem to recall that some crond implementations avoid re-running an entry if the earlier execution hasn't finished, not sure if it's the case there.