Page MenuHomePhabricator

Enable statement usage tracking on hywiki
Closed, ResolvedPublic

Description

As agreed in T188730#7461581 and T188730#7525888, we want to remove the wmgWikibaseDisabledUsageAspects setting for Armenian Wikipedia. This is expected to increase the number of rows in the wbc_entity_usage to no more than 65 million, most likely much less (currently 6.4 million). We should keep track of the size of the table after making the change, as well as the size of the recentchanges table and the ratio of Wikidata edits there, and check on-wiki Lua modules for obvious mistakes if it grows too large.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
OpenNone
Resolvednoarave

Event Timeline

2021-11-24:

lucaswerkmeister-wmde@stat1007:~$ sudo -u analytics-wmde analytics-mysql hywiki <<< 'SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema = "hywiki" AND table_name IN ("wbc_entity_usage", "recentchanges");' | expand -t20
table_name          table_rows
recentchanges       242694
wbc_entity_usage    4967541

Change 755330 had a related patch set uploaded (by Noa wmde; author: Noa wmde):

[operations/mediawiki-config@master] Enable statement usage tracking for Armenian Wikipedia

https://gerrit.wikimedia.org/r/755330

After this is deployed we should make sure data is being tracked under "Statement Flags" in the hywiki dashboard: https://grafana.wikimedia.org/d/000000176/wikidata-entity-usage-project?orgId=1&var-project=hywiki

Quick check on the recentchanges table before deployment: 378k rows, 288k of which come from Wikidata.

MariaDB [hywiki]> SELECT COUNT(*) AS total, SUM(rc_type = 5) FROM recentchanges;
+--------+------------------+
| total  | SUM(rc_type = 5) |
+--------+------------------+
| 377612 |           287640 |
+--------+------------------+
1 row in set (0.264 sec)

Most of them seem to be aliases:

Screenshot from 2022-01-25 12-58-28.png (777×1 px, 342 KB)

Aliases fall under “Other” usage, if I’m not mistaken, so fixing the usage tracking should hopefully remove a lot of these pointless recentchanges entries (there’s little legitimate reason for Armenian Wikipedia to care about so many Russian Aliases, I assume).

Change 755330 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable statement usage tracking for Armenian Wikipedia

https://gerrit.wikimedia.org/r/755330

Mentioned in SAL (#wikimedia-operations) [2022-01-25T12:10:04Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755330|Enable statement usage tracking for Armenian Wikipedia (hywiki) (T296382)]] (duration: 00m 50s)

Hm, curious that hywiki should already have amassed over 800k statement usages in the few hours since the change was deployed…

MariaDB [hywiki]> SELECT COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------+------------+---------+
| total   | statements | others  |
+---------+------------+---------+
| 7286396 |     868357 | 1711896 |
+---------+------------+---------+
1 row in set (2.391 sec)

Perhaps they’re running a bot over there that frequently purges articles? Let’s check back tomorrow or in a few days.

MariaDB [hywiki]> SELECT NOW() AS asof, COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------------------+---------+------------+---------+
| asof                | total   | statements | others  |
+---------------------+---------+------------+---------+
| 2022-01-25 16:56:24 | 7441381 |    1050128 | 1684320 |
+---------------------+---------+------------+---------+
1 row in set (2.756 sec)

If we assume that the “other” usages continue decreasing at a constant rate, and the “statements” usages increase at a constant rate until the “others” hit zero, we can estimate (1050128 - 868357) / (1711896 - 1684320) * 1684320 + 1050128 = 12152555 12 million total statement usages, for ca. 18½ million rows in the table in total. That’s substantial (not quite the “much less [than 65 million]” I had in mind in the task description), but not likely to cause any actual problems, I think. hywiki just uses more statements per page than I expected initially.

MariaDB [hywiki]> SELECT NOW() AS asof, COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------------------+----------+------------+---------+
| asof                | total    | statements | others  |
+---------------------+----------+------------+---------+
| 2022-01-26 10:27:25 | 16209842 |   10125711 | 1375932 |
+---------------------+----------+------------+---------+
1 row in set (7.655 sec)

If the “other” usages continue to go down at a similar rate, this should settle in a bit over three days or so, assuming I’m operating units correctly.

Disregard this comment, I accidentally queried warwiki instead of hywiki.

You can write a bot that purges pages that have "O" aspect, that would speed up the clean up. I think I have written one back then in 2016/2017. Do you want me to run it again?

Nah, for now I think we can wait a bit for this to settle naturally. Also, I queried the wrong wiki in my last comment 🤦

Current results:

MariaDB [hywiki]> SELECT NOW() AS asof, COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------------------+----------+------------+---------+
| asof                | total    | statements | others  |
+---------------------+----------+------------+---------+
| 2022-01-27 10:31:06 | 16080444 |   10165227 | 1205261 |
+---------------------+----------+------------+---------+
1 row in set (7.826 sec)
MariaDB [hywiki]> SELECT NOW() AS asof, COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------------------+----------+------------+--------+
| asof                | total    | statements | others |
+---------------------+----------+------------+--------+
| 2022-01-31 11:35:28 | 15030757 |   10033840 | 296220 |
+---------------------+----------+------------+--------+
1 row in set (6.776 sec)

Total and statement usages went slightly down since Thursday; other usages down from 1.2M to 0.3M. Slightly surprising, but not concerning, I’d say.

The recentchanges now have 382k rows in total, 293k of which come from Wikibase, which is a fairly minor change compared to T296382#7648565 (378k/288k).

MariaDB [hywiki]> SELECT COUNT(*) AS total, SUM(rc_type = 5) FROM recentchanges;
+--------+------------------+
| total  | SUM(rc_type = 5) |
+--------+------------------+
| 382183 |           292889 |
+--------+------------------+
1 row in set (0.279 sec)

And it looks like the alias changes are indeed mostly gone and there are statement changes instead.

Screenshot from 2022-01-31 17-38-36.png (641×1 px, 322 KB)

MariaDB [hywiki]> SELECT NOW() AS asof, COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------------------+----------+------------+--------+
| asof                | total    | statements | others |
+---------------------+----------+------------+--------+
| 2022-02-01 10:23:10 | 15023372 |   10034429 | 286302 |
+---------------------+----------+------------+--------+
1 row in set (8.064 sec)

Statement usages fairly stable now, other usages still going down, albeit not very rapidly anymore.

One month later:

MariaDB [hywiki]> SELECT NOW() AS asof, COUNT(*) AS total, SUM(eu_aspect LIKE 'C%') AS statements, SUM(eu_aspect LIKE 'O%') AS others FROM wbc_entity_usage;
+---------------------+----------+------------+--------+
| asof                | total    | statements | others |
+---------------------+----------+------------+--------+
| 2022-02-28 09:55:39 | 14842746 |   10086742 |   8391 |
+---------------------+----------+------------+--------+
1 row in set (6.068 sec)

Other usages are basically gone; statement usages have stayed around 10 million; total usages are down to just under 15 million (from over 16 million in T296382#7651993).

MariaDB [hywiki]> SELECT COUNT(*) AS total, SUM(rc_type = 5) FROM recentchanges;
+--------+------------------+
| total  | SUM(rc_type = 5) |
+--------+------------------+
| 274021 |           210804 |
+--------+------------------+
1 row in set (0.200 sec)

Total recent changes rows are down from 380k (T296382#7664474) to 270k, with 210k Wikidata changes (down from 290k).

I think we can call this a success and close this task.