Page MenuHomePhabricator

Regularly run recountCategories.php on Wikimedia wikis via systemd timer
Closed, ResolvedPublic

Description

For reasons, category counts tend to get out of sync, leading to categories having inaccurate counts, even sometimes negative counts (impossible). @Taylor has documented past examples of this at T85696#7105227.

One suggestion (which I pushed a patch for) is to recount categories on action=purge: T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category, but that requires human intervention and someone to recognize the count is wrong. We have a pretty efficient script, recountCategories.php, that should be run regularly to reconcile any remaining differences. We had to run it everywhere for T299244 and the slowest wiki was commonswiki at ~45 minutes.

I propose we run recountCategories.php once a month as a systemd timer.

Event Timeline

Peachey88 renamed this task from Run recountCategories.php regularly on Wikimedia wikis to Regularly run recountCategories.php regularly on Wikimedia wikis via systemd timer.Jan 22 2022, 7:44 AM
taavi renamed this task from Regularly run recountCategories.php regularly on Wikimedia wikis via systemd timer to Regularly run recountCategories.php on Wikimedia wikis via systemd timer.Jan 22 2022, 7:51 AM

How do we want to run this? For all wikis? By shards?

Taking as example what @Legoktm did for T299244, I made a draft patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/756069

If you think we can go directly with foreachwiki, please let me know. It'll make the file substantially shorter too.

SUPPORT, all wikis, once per month.

How do we want to run this? For all wikis? By shards?

Taking as example what @Legoktm did for T299244, I made a draft patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/756069

If you think we can go directly with foreachwiki, please let me know. It'll make the file substantially shorter too.

I think we can do this with a plain foreachwiki; in T299244 there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.

Change 756069 had a related patch set uploaded (by Legoktm; author: MarcoAurelio):

[operations/puppet@production] [WIP] p::mediawiki::maintenance: Run recountCategories.php regularly

https://gerrit.wikimedia.org/r/756069

I think we can do this with a plain foreachwiki; in T299244 there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.

Thank you. Patch amended to use foreachwiki.

I see that the script suggests to run rMW maintenance/cleanupEmptyCategories.php in --mode remove if the recounting category script is run in the pages mode (we're configuring it to use all which includes that mode). Shall we configure a periodic job for cleanupEmptyCategories.php too?

I think we can do this with a plain foreachwiki; in T299244 there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.

Thank you. Patch amended to use foreachwiki.

LGTM!

I see that the script suggests to run rMW maintenance/cleanupEmptyCategories.php in --mode remove if the recounting category script is run in the pages mode (we're configuring it to use all which includes that mode). Shall we configure a periodic job for cleanupEmptyCategories.php too?

Good point. See https://gerrit.wikimedia.org/r/c/mediawiki/core/+/756415

@Majavah Could you please run the improved recountCategories script on the Beta Cluster and save the output for review in a Paste?

foreachwiki maintenance/recountCategories --mode all >path/to/logs

@Legoktm and myself would be interested in knowing how it does before merging the Puppet patch.

Thanks!

Mentioned in SAL (#wikimedia-releng) [2022-01-28T21:45:05Z] <taavi> running recountCategories.php on all beta wikis per T299823#7652496

@Majavah Could you please run the improved recountCategories script on the Beta Cluster and save the output for review in a Paste?

P19566

Change 756069 merged by RLazarus:

[operations/puppet@production] mediawiki::maintenance: Run recountCategories.php monthly on all wikis

https://gerrit.wikimedia.org/r/756069

Mentioned in SAL (#wikimedia-operations) [2022-02-03T20:43:10Z] <rzl> rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_recount_categories.service # T299823

As requested on IRC, here's the log from today's run:

Legoktm added a project: User-notice.

LGTM! Looking at the mysql-aggregated dashboard for those times, I can't really see any major spikes. Looking at enwiki/s1, you can see the increase in rows read, but there are (presumably) normal traffic spikes that are even bigger.

I added a note in tech news: https://meta.wikimedia.org/w/index.php?title=Tech%2FNews%2F2022%2F06&type=revision&diff=22755031&oldid=22754011

Closing because I think we're all set!

I've checked the example categories mentioned by Liz in that thread and they show as empty on Category:Empty categories awaiting deletion and they have no articles inside either (so they're indeed empty categories correctly displaying as empty right now). Unless I am not reading the thread right, looks like there are no issues now?

As for the cause, could it be some caching issue? or DB master/replica lag?

enwiki had quite a bit of rows to update https://phabricator.wikimedia.org/F34941981$6776 only next to commonswiki https://phabricator.wikimedia.org/F34941981$2934

Re: Tech News - let me know if any changes are needed, ASAP. I'll be freezing it for translations within ~2 hours.

Re: Tech News - let me know if any changes are needed, ASAP. I'll be freezing it for translations within ~2 hours.

I think what we have now is fine. On IRC MA and I discussed that while it is possible things are temporarily wrong because of this script (which is still undetermined), overall things are less wrong.

Shall we ensure: absent the script while the investigation on the reported issues is ongoing?

I had looked at this back in February or in March and didn't think there was anything actionable left. If people are having issues with category counts being significantly wrong at the beginning of the month (after this script runs), please shout.