Maintenance script to be written in T270247.
The parent task contains two queries to collect deleted items, which one should be used for the script is to be decided.
noarave | |
Dec 16 2020, 9:37 AM |
F34161485: Screenshot_2021-03-15 MySQL - Grafana-semaphores.png | |
Mar 15 2021, 12:39 PM |
F34161473: Screenshot_2021-03-15 MySQL - Grafana.png | |
Mar 15 2021, 12:39 PM |
F34161483: Screenshot_2021-03-15 MySQL - Grafana-bytes.png | |
Mar 15 2021, 12:39 PM |
Maintenance script to be written in T270247.
The parent task contains two queries to collect deleted items, which one should be used for the script is to be decided.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Addshore | T263730 Wikidata term store contains rows for deleted items | |||
Resolved | Lucas_Werkmeister_WMDE | T270249 Run maintenance script to remove deleted items from term store on production |
The maintenance script is repo/maintenance/RemoveDeletedItemsFromTermStore.php and will be available in production starting with wmf/1.36.0-wmf.28 (next week).
Wait, I just realized the maintenance script isn’t enough. We need to do two things:
If we just run the maintenance script on its own, it won’t do anything – all the term IDs we would pass to it are still referenced in wbt_item_terms, after all.
Should this be another maintenance script (or added to the same maintenance script?), or should I try to run the correct DELETE SQL statements directly? Something like:
SELECT DISTINCT wbit_item_id FROM wbt_item_terms JOIN logging ON log_namespace = 0 AND SUBSTRING(log_title, 2) = wbit_item_id LEFT JOIN page ON page_namespace = 0 AND page_title = log_title WHERE log_type = 'delete' AND log_action = 'delete' AND log_timestamp > '20190901000000' AND page_id IS NULL;
DELETE wbt_item_terms FROM wbt_item_terms LEFT JOIN page ON page_namespace = 0 AND page_title = CONCAT('Q', CAST(wbit_item_id AS CHAR)) WHERE page_id IS NULL AND wbit_item_id IN (...); -- list from stats machine
Given that there are 9593 distinct item IDs (T263730#6490963 has 9594 distinct results, but that’s because it includes the log_title and log_timestamp – apparently, one item was deleted twice) and 1145260 total wbt_item_terms rows that need to be deleted, we definitely need to batch those deletes and wait for replication in between. We might need another maintenance script after all…
So with a new maintenance script, the procedure could look like this:
Or we merge the responsibilities of 2.A and 2.B into one maintenance script after all:
In any case – at some point, we need to delete those 1.1 milion wbt_item_term rows belonging to 9593 deleted item IDs; assuming a batch size of 100 and a replication wait time of 1 s (and no time for the deletes themselves), that would take over three hours. Definitely needs a scheduled maintenance window, and possibly we’ll want to break it up into multiple batches of item IDs outside the script as well.
I created a new task, T274890: Make RemoveDeletedItemsFromTermStore.php maintenance script remove wbt_item_terms rows, to improve the maintenance script. Moving this to Stalled in the meantime.
I’ve added two entries for this on the deployments calendar: tomorrow (Thursday) during the EU backport window, I’ll test this using the four item IDs Q581768,Q739279,Q774383,Q852302; then, on Monday after the EU backport window, I’ll run the script with the remaining item IDs (probably still split into several batches).
Mentioned in SAL (#wikimedia-operations) [2021-03-11T12:13:44Z] <Lucas_WMDE> lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302 # T270249, finished in 1.124s
Maintenance script seems to work fine – removing four items brought us down from 9593 items to 9589 items. I’ll call that a successful test and do the rest on Monday.
Mentioned in SAL (#wikimedia-operations) [2021-03-15T12:10:04Z] <Lucas_WMDE> lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5,54p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 50 items
Mentioned in SAL (#wikimedia-operations) [2021-03-15T12:11:21Z] <Lucas_WMDE> lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 55,554p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 500 items
Mentioned in SAL (#wikimedia-operations) [2021-03-15T12:13:52Z] <Lucas_WMDE> lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 555,5554p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 5000 items
Mentioned in SAL (#wikimedia-operations) [2021-03-15T12:23:31Z] <Lucas_WMDE> lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5555,9593p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, remaining 4039 items
Mentioned in SAL (#wikimedia-operations) [2021-03-15T12:31:07Z] <Lucas_WMDE> maintenance scripts for T270249 completed successfully, no more terms for deleted items found on stat1007
In the end, the maintenance script was thankfully substantially faster than that :)
You can see the impact of the script on the s8 master (db1104), by the way:
Each of those DELETE spikes or plateaus corresponds to one run of the maintenance script. (They never reached the level of INSERT or UPDATE, though.)
The changes in the I/O and Semaphores graphs might also be due to this, I’m not sure: