Page MenuHomePhabricator

Run recountCategories.php on Wikimedia wikis
Closed, ResolvedPublic

Description

Run the new recountCategories.php script on all wikis. It has to be run three times on all wikis, once with --mode pages, once with --mode files and once with --mode subcats. You might want to pick a suitable value for --throttle as well.

Event Timeline

Can't be run until .10 is everywhere

TTO changed the task status from Open to Stalled.Jul 16 2017, 11:46 AM
TTO changed the task status from Stalled to Open.Jul 22 2017, 1:09 AM

.10 is now everywhere :)

@Reedy any chance you will have time to look at running this?

Fun, yeah, I can.

Will look at running it on the test wikis today, and see how the output looks etc :)

reedy@terbium:~$ mwscript recountCategories.php --wiki=testwiki --mode=pages | tee ~/testwiki.log
Finding up to 500 drifted rows starting at cat_id 500...
Updating cat_pages field on 154 rows...
Finding up to 500 drifted rows starting at cat_id 500...
Done! Updated the pages counts of 154 categories.
Now run the script using the other --mode options if you haven't already.
Also run 'php cleanupEmptyCategories.php --mode remove' to remove empty,
nonexistent categories from the category table.

reedy@terbium:~$ mwscript recountCategories.php --wiki=testwiki --mode=subcats | tee ~/testwiki.log
Finding up to 500 drifted rows starting at cat_id 500...
Updating cat_subcats field on 4 rows...
Finding up to 500 drifted rows starting at cat_id 500...
Done! Updated the subcats counts of 4 categories.
Now run the script using the other --mode options if you haven't already.
reedy@terbium:~$ mwscript recountCategories.php --wiki=testwiki --mode=files | tee ~/testwiki.log
Finding up to 500 drifted rows starting at cat_id 500...
Updating cat_files field on 10 rows...
Finding up to 500 drifted rows starting at cat_id 500...
Done! Updated the files counts of 10 categories.
Now run the script using the other --mode options if you haven't already.
reedy@terbium:~$ mwscript cleanupEmptyCategories.php --wiki=testwiki | tee ~/testwiki.log
...Update 'cleanup empty categories' already logged as completed.
reedy@terbium:~$ mwscript cleanupEmptyCategories.php --wiki=testwiki --force | tee ~/testwiki.log
Adding empty categories with description pages...
Removing empty categories without description pages...
The category named :Sub-Sub-Category_Bleah_tst is not valid?!
--mode=remove --begin=‪中文(简体)‬
Category cleanup complete.
reedy@terbium:~$
reedy@terbium:~$ mwscript recountCategories.php --wiki=test2wiki --mode=pages | tee ~/test2wiki.log
Finding up to 500 drifted rows starting at cat_id 500...
Updating cat_pages field on 25 rows...
Finding up to 500 drifted rows starting at cat_id 500...
Done! Updated the pages counts of 25 categories.
Now run the script using the other --mode options if you haven't already.
Also run 'php cleanupEmptyCategories.php --mode remove' to remove empty,
nonexistent categories from the category table.

reedy@terbium:~$ mwscript recountCategories.php --wiki=test2wiki --mode=subcats | tee ~/test2wiki.log
Finding up to 500 drifted rows starting at cat_id 500...
Done! Updated the subcats counts of 0 categories.
Now run the script using the other --mode options if you haven't already.
reedy@terbium:~$ mwscript recountCategories.php --wiki=test2wiki --mode=files | tee ~/test2wiki.log
Finding up to 500 drifted rows starting at cat_id 500...
Done! Updated the files counts of 0 categories.
Now run the script using the other --mode options if you haven't already.
reedy@terbium:~$ mwscript cleanupEmptyCategories.php --wiki=test2wiki --force | tee ~/test2wiki.log
Adding empty categories with description pages...
Removing empty categories without description pages...
--mode=remove --begin=Statut_UICN_EN
Category cleanup complete.
reedy@terbium:~$

Unfortunately line 101 of the script is wrong. It should be printing $this->minimumId. I don't know if that really matters though, it looks like that was just intended to let you see how long the script is taking, and I guess you'll be running it headless.

Ping @MaxSem regardless.

TTO removed TTO as the assignee of this task.Mar 19 2020, 12:36 AM

Not sure why this task was assigned to me; I don't have, and have never had, shell access.

Unfortunately line 101 of the script is wrong. It should be printing $this->minimumId.

For future reference, this was solved in T247215.

Does this still need to be done?

Probably… but there's now also T224321: Run populateCategory.php and I'm not sure what's the difference.

YES this still needs to be done. Many wikis are broken.

Mentioned in SAL (#wikimedia-operations) [2021-06-22T22:38:07Z] <urbanecm> mwscript recountCategories.php --wiki=eowiktionary --mode={pages,subcats,files} (T170737)

Mentioned in SAL (#wikimedia-operations) [2021-06-22T22:41:28Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=pages # T170737

Mentioned in SAL (#wikimedia-operations) [2021-06-22T22:42:28Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=subcats # T170737

Probably… but there's now also T224321: Run populateCategory.php and I'm not sure what's the difference.

populateCategory.php was designed to initially populate the category table when upgrading to MW 1.13. It was deleted in rMW0dacf7d68d8d517cada731375f9612d8e060db58.

recountCategories.php is the script that should be used.

The script has been applied to -eo- wiktionary and the broken categories seem fixed. Now we can think about a more permanent solution for the problem, in the form of of running this script regularly or upon request, or by other means. Another wiki badly needing this was commons. See T85696.

The script has been applied to -eo- wiktionary and the broken categories seem fixed. Now we can think about a more permanent solution for the problem, in the form of of running this script regularly or upon request, or by other means. Another wiki badly needing this was commons. See T85696.

T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category is definitely not a good permanent solution. If the miscounts still happen (ie it's not an ancient bug we just run into), we should find out why it happens, and fix the bug.

Mentioned in SAL (#wikimedia-operations) [2021-06-23T12:15:16Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist s2 recountCategories.php --mode=pages && foreachwikiindblist s2 recountCategories.php --mode=subcats && foreachwikiindblist s2 recountCategories.php --mode=files # T170737

Mentioned in SAL (#wikimedia-operations) [2021-06-23T12:26:15Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s5

Mentioned in SAL (#wikimedia-operations) [2021-06-23T12:35:38Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s6

Mentioned in SAL (#wikimedia-operations) [2021-06-23T12:46:17Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s7

Mentioned in SAL (#wikimedia-operations) [2021-06-23T12:59:22Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s3

Mentioned in SAL (#wikimedia-operations) [2021-06-23T13:27:42Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s4

Mentioned in SAL (#wikimedia-operations) [2021-06-23T14:53:36Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s8

Mentioned in SAL (#wikimedia-operations) [2021-06-23T14:54:29Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s1

Urbanecm claimed this task.

Done for:

  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • s8

Should be done everywhere. Resolving.