Create a maintenance script for pruning stale entity subscriptions and run periodically
Open, Needs TriagePublic

Description

Apparently we have quite some stale entity subscriptions:

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM wb_changes_subscription WHERE cs_subscriber_id = 'cawiki' AND NOT EXISTS(SELECT 1 FROM cawiki_p.wbc_entity_usage WHERE eu_entity_id = cs_entity_id);
+----------+
| COUNT(*) |
+----------+
|    13765 |
+----------+
1 row in set (33.19 sec)

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM wb_changes_subscription WHERE cs_subscriber_id = 'ruwiki' AND NOT EXISTS(SELECT 1 FROM ruwiki_p.wbc_entity_usage WHERE eu_entity_id = cs_entity_id);
+----------+
| COUNT(*) |
+----------+
|   104520 |
+----------+
1 row in set (40.40 sec)

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM wb_changes_subscription WHERE cs_subscriber_id = 'enwiki' AND NOT EXISTS(SELECT 1 FROM enwiki_p.wbc_entity_usage WHERE eu_entity_id = cs_entity_id);
+----------+
| COUNT(*) |
+----------+
|   133069 |
+----------+
1 row in set (12 min 42.03 sec)

We should create a maintenance scripts which can prune these subscriptions and make sure it is run from time to time.

hoo created this task.Aug 10 2017, 3:21 PM
Restricted Application added subscribers: PokestarFan, Aklapper. · View Herald TranscriptAug 10 2017, 3:21 PM

what's the situation in the other direction ?

hoo added a comment.Aug 10 2017, 3:43 PM

what's the situation in the other direction ?

What's the other direction? What do you mean by that?

Entities where client wikis think they are subscribed to, but Wikidata doesn't dispatch any change as the entry in its list is missing.

hoo added a comment.Aug 10 2017, 4:10 PM

Entities where client wikis think they are subscribed to, but Wikidata doesn't dispatch any change as the entry in its list is missing.

That is not a problem:

MariaDB [wikidatawiki_p]> SELECT COUNT(DISTINCT eu_entity_id) FROM ruwiki_p.wbc_entity_usage WHERE eu_entity_id NOT IN (SELECT cs_entity_id FROM wb_changes_subscription WHERE cs_subscriber_id = 'ruwiki');
+------------------------------+
| COUNT(DISTINCT eu_entity_id) |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (2 min 41.57 sec)

MariaDB [wikidatawiki_p]> SELECT COUNT(DISTINCT eu_entity_id) FROM cawiki_p.wbc_entity_usage WHERE eu_entity_id NOT IN (SELECT cs_entity_id FROM wb_changes_subscription WHERE cs_subscriber_id = 'cawiki');
+------------------------------+
| COUNT(DISTINCT eu_entity_id) |
+------------------------------+
|                            1 |
+------------------------------+
1 row in set (1 min 38.99 sec)
Esc3300 added a comment.EditedAug 10 2017, 4:27 PM

Looks like the problems mentioned at T171928#3489904 have gone. a query for enwikivoyage has no entry either.

Oddly T119738 doesn't seem to impact it.

Pruning stale subscriptions would be nice indeed. Finding out where they come from, and fixing that, would be even better :)

Quick reality check: based on the above numbers, less than 2% of enwiki subscriptions are stale, and about 5% of ruwiki subscriptions are stale. That's worth investigation, but probably doesn't have a huge impact.

So let's be realistic and close this ticket?