Page MenuHomePhabricator

updateArticleCount.php incorrectly updates multi-shard site stats
Open, Needs TriagePublic

Description

updateArticleCount.php counts all the good articles (content pages), then writes that number to the first row of the site_stats table:

$dbw->update(
	'site_stats',
	[ 'ss_good_articles' => $result ],
	[ 'ss_row_id' => 1 ],
	__METHOD__
);

It doesn’t touch any other rows, so if there are other rows, as would be the case when $wgMultiShardSiteStats is set (T306589), then the total number would now be too high. For example, on testwiki:

lucaswerkmeister-wmde@mwmaint1002:~$ mwscript updateArticleCount.php testwiki; sql testwiki <<< 'SELECT ss_row_id, ss_good_articles FROM site_stats'
Counting articles...found 4816.
To update the site statistics table, run the script with the --update option.
ss_row_id	ss_good_articles
1	487
2	482
3	481
4	481
5	480
6	481
7	480
8	481
9	481
10	482

There are 4816 good articles at the moment (both according to updateArticleCount.php and when you sum up all the ss_good_articles values). If I ran the script with the --update option, then the table would afterwards look like this:

ss_row_id	ss_good_articles
1	4816
2	482
3	481
4	481
5	480
6	481
7	480
8	481
9	481
10	482

And the total would now be reported as 9145. (In general, if the rows are equally distributed, then the reported number would be 90% too high.)

This is probably mostly irrelevant, because $wgMultiShardSiteStats is only recommended for wikis that are so large that you probably don’t want to run updateArticleCount.php anyways.

Event Timeline

Easiest solution is probably just for the maintenance script to set ss_good_articles to 0 for all other rows? They values don’t need to be evenly distributed, after all (only the writes need to be distributed).