Page MenuHomePhabricator

[C-DIS] wbc_entity_usage has obsolete entries
Open, MediumPublic

Description

Running

SELECT wbc_entity_usage.*, page_id FROM wbc_entity_usage LEFT JOIN page ON page_id = eu_page_id WHERE eu_aspect = 'X';

on cswiki yields:

eu_row_id	eu_entity_id	eu_aspect	eu_page_id	eu_touched	page_id
424641		Q7922495	X		1106746
425232		Q20064731	X		1108280
425830		Q18630767	X		1110149
425931		Q20057917	X		1110407
426041		Q20057192	X		1110618
426068		Q20058185	X		1110656
426633		Q2366998	X		1112162
4774494		Q6579410	X		198664
5987656		Q16987803	X		1281663

eu_page_id refers to deleted pages. Moreover, the second, fourth, fifth and sixth entry refer to deleted Wikidata items via eu_entity_id.

With WHERE page_id IS NULL I get 112 rows that track usage for deleted pages.

Event Timeline

Lydia_Pintscher moved this task from incoming to consider for next sprint on the Wikidata board.

So the assumption is we're not removing some usages when the page is deleted on the client?

Change 426892 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Prune usages and subscriptions after deleting a page

https://gerrit.wikimedia.org/r/426892

If I understand correctly, there are actually two tables where we’re leaving stuff behind: after a page was deleted, wbc_entity_usage should no longer contain any entries with an eu_page_id of the deleted page, but some rows for the affected wiki might also need to be removed from wb_changes_subscription if those entities are no longer used by any other page on that wiki. @Ladsgroup’s patch properly cleans up both for newly deleted pages, but we should keep this in mind if we want to clean up after old deleted pages as well.

Change 426892 abandoned by Ladsgroup:
Prune usages and subscriptions after deleting a page

Reason:
This is not the right way to do it

https://gerrit.wikimedia.org/r/426892

Michael subscribed.

Worth checking if that problem actually still exists. Also, we should probably have a dedicated phabricator tag for usage-tracking/change-dispatching.

MariaDB [cswiki_p]> SELECT COUNT(*) FROM wbc_entity_usage LEFT JOIN page ON page_id = eu_page_id WHERE page_id IS NULL;
+----------+
| COUNT(*) |
+----------+
|     1212 |
+----------+
1 row in set (8.622 sec)

Apparently, entries of deleted pages are not always pruned but accumulate over time.

ArthurTaylor renamed this task from wbc_entity_usage has obsolete entries to [C-DIS] wbc_entity_usage has obsolete entries.Feb 2 2024, 11:02 AM

Apparently, entries of deleted pages are not always pruned but accumulate over time.

Just to add a second example:

mysql:research@dbstore1008.eqiad.wmnet [enwiki]> SELECT COUNT(*) FROM wbc_entity_usage LEFT JOIN page ON page_id = eu_page_id WHERE page_id IS NULL;

+----------+
| COUNT(*) |
+----------+
|    32851 |
+----------+
1 row in set (43.732 sec)

So, would be nice to clean up, but not a huge problem at the moment, I think.

Might be worth doing/figuring out together with T206857. Both seem to have obsolete entries and both are conceptually very close to each other, but the root cause might be different.