Page MenuHomePhabricator

Update analytics-wmde-scripts for site_stats sharding
Closed, ResolvedPublic

Description

The site_stats table is now “sharded” (but still on a single node, even in a single table, just split across multiple rows). Several scripts in analytics/wmde/scripts.git query this table and assume a single row; specifically:

good_articles.php
$result = $pdo->query( 'select ss_good_articles from wikidatawiki.site_stats' );
total_edits.php
$result = $pdo->query( 'select ss_total_edits from wikidatawiki.site_stats' );
total_pages.php
$result = $pdo->query( 'select ss_total_pages from wikidatawiki.site_stats' );
users.php
$result = $pdo->query( 'select ss_users from wikidatawiki.site_stats' );

All of these just send $rows[0]['ss_field_name'] to Grafana, and MariaDB happens to return the first largest row first:

MariaDB [wikidatawiki]> select ss_total_pages from wikidatawiki.site_stats;
+----------------+
| ss_total_pages |
+----------------+
|      102160147 |
|           7267 |
|           7207 |
|           7265 |
|           7140 |
|           7162 |
|           7266 |
|           7190 |
|           7197 |
|           7238 |
+----------------+
10 rows in set (0.001 sec)

So as long as MariaDB keeps returning the rows in this order, the scripts still record a roughly accurate number. However, we still need to fix the scripts to actually report the sum of all the rows; the longer we wait, the larger the jump in the graph will be when we finally fix it from “number of pages/etc. until 2022-05-31 plus one tenth of the number of pages since 2022-05-31” to “total number of pages”.

Event Timeline

Change 803458 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@master] Sum site_stats rows

https://gerrit.wikimedia.org/r/803458

Change 803458 merged by jenkins-bot:

[analytics/wmde/scripts@master] Sum site_stats rows

https://gerrit.wikimedia.org/r/803458

Change 803543 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@production] Sum site_stats rows

https://gerrit.wikimedia.org/r/803543

Change 803543 merged by jenkins-bot:

[analytics/wmde/scripts@production] Sum site_stats rows

https://gerrit.wikimedia.org/r/803543

This task might be the explanation for the bend in the Wikidata Site Stats graphs, the timing would line up well:

image.png (533×1 px, 70 KB)

Let’s see if that jumps back up tomorrow.

Yup, to me that looks like the actual numbers went up regularly and we’ve now caught up with them again:

image.png (536×1 px, 60 KB)

(It’s most obvious in the “total edits” graph, which should be a pretty straight diagonal line.)

@Manuel do you want to confirm?

Manuel closed this task as Resolved.EditedJun 9 2022, 12:11 PM

I can confirm: Your explanation makes perfect sense. We seem to now have caught up with the actual numbers. The total number edits for example now matches https://www.wikidata.org/wiki/Special:Statistics again.

Thank you, Lucas, for handling this! \o/