Page MenuHomePhabricator

Create a maintenance script for pruning stale entity subscriptions and run periodically
Closed, DeclinedPublic

Description

Apparently we have quite some stale entity subscriptions:

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM wb_changes_subscription WHERE cs_subscriber_id = 'cawiki' AND NOT EXISTS(SELECT 1 FROM cawiki_p.wbc_entity_usage WHERE eu_entity_id = cs_entity_id);
+----------+
| COUNT(*) |
+----------+
|    13765 |
+----------+
1 row in set (33.19 sec)

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM wb_changes_subscription WHERE cs_subscriber_id = 'ruwiki' AND NOT EXISTS(SELECT 1 FROM ruwiki_p.wbc_entity_usage WHERE eu_entity_id = cs_entity_id);
+----------+
| COUNT(*) |
+----------+
|   104520 |
+----------+
1 row in set (40.40 sec)

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM wb_changes_subscription WHERE cs_subscriber_id = 'enwiki' AND NOT EXISTS(SELECT 1 FROM enwiki_p.wbc_entity_usage WHERE eu_entity_id = cs_entity_id);
+----------+
| COUNT(*) |
+----------+
|   133069 |
+----------+
1 row in set (12 min 42.03 sec)

We should create a maintenance scripts which can prune these subscriptions and make sure it is run from time to time.

Event Timeline

what's the situation in the other direction ?

what's the situation in the other direction ?

What's the other direction? What do you mean by that?

Entities where client wikis think they are subscribed to, but Wikidata doesn't dispatch any change as the entry in its list is missing.

Entities where client wikis think they are subscribed to, but Wikidata doesn't dispatch any change as the entry in its list is missing.

That is not a problem:

MariaDB [wikidatawiki_p]> SELECT COUNT(DISTINCT eu_entity_id) FROM ruwiki_p.wbc_entity_usage WHERE eu_entity_id NOT IN (SELECT cs_entity_id FROM wb_changes_subscription WHERE cs_subscriber_id = 'ruwiki');
+------------------------------+
| COUNT(DISTINCT eu_entity_id) |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (2 min 41.57 sec)

MariaDB [wikidatawiki_p]> SELECT COUNT(DISTINCT eu_entity_id) FROM cawiki_p.wbc_entity_usage WHERE eu_entity_id NOT IN (SELECT cs_entity_id FROM wb_changes_subscription WHERE cs_subscriber_id = 'cawiki');
+------------------------------+
| COUNT(DISTINCT eu_entity_id) |
+------------------------------+
|                            1 |
+------------------------------+
1 row in set (1 min 38.99 sec)

Pruning stale subscriptions would be nice indeed. Finding out where they come from, and fixing that, would be even better :)

Quick reality check: based on the above numbers, less than 2% of enwiki subscriptions are stale, and about 5% of ruwiki subscriptions are stale. That's worth investigation, but probably doesn't have a huge impact.

So let's be realistic and close this ticket?

So let's be realistic and close this ticket?

Yes, probably not worth spending time on right now.