Page MenuHomePhabricator

[C-DIS][SW] Drop index on change_revision_id in wb_changes table
Open, Needs TriagePublic

Description

This index is no longer needed in our production code, but it still seems to be used by some puppet task, see T291086 for further usage data.

image.png (515×1 px, 55 KB)

See https://gerrit.wikimedia.org/g/operations/puppet/+/79971d5304a5643b56547ffb956296103fa800c1/modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml#839

We should figure out if that is actually still needed and if not, then remove it and the index on change_revision_id in wb_changes table.

Event Timeline

See https://gerrit.wikimedia.org/g/operations/puppet/+/79971d5304a5643b56547ffb956296103fa800c1/modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml#839

If I understand correctly, this controls the wikidatawiki_p.wb_changes view in the Toolforge / Cloud VPS replicas (e.g. in Quarry). We don’t want to make wb_changes rows for deleted revisions public, so we join the revision and archive tables, and only make the change_user_id and change_info available if the corresponding revision wasn’t deleted in either of those tables. (We leave the entity ID public in that case, presumably because that’s also more or less public on Special:Log.)

I’m not sure if this requires an index on change_revision_id in the wikidatawiki.wb_changes table in those replicas (the source of the wikidata_p.wb_changes view) – I feel like the most important thing would be that the revision and archive tables have an index for the join, whereas on wb_changes it might not be needed? But in any case, I’m pretty sure that we don’t need this index in production, because those views don’t directly operate on the production tables anyways.

Yeah, I don’t think the wb_changes_change_revision_id is used for that view.

MariaDB [wikidatawiki]> explain select change_id, change_type, change_time, change_object_id, change_revision_id, if(rev_deleted&4 OR ar_deleted&4,null,change_user_id) as change_user_id, if(rev_deleted&6 OR ar_deleted&6,null,change_info) as change_info from wb_changes left join revision on rev_id = change_revision_id left join archive on ar_rev_id = change_revision_id limit 10;
+------+-------------+------------+--------+---------------+---------------+---------+--------------------------------------------+------+-------+
| id   | select_type | table      | type   | possible_keys | key           | key_len | ref                                        | rows | Extra |
+------+-------------+------------+--------+---------------+---------------+---------+--------------------------------------------+------+-------+
|    1 | SIMPLE      | wb_changes | ALL    | NULL          | NULL          | NULL    | NULL                                       | 7    |       |
|    1 | SIMPLE      | revision   | eq_ref | PRIMARY       | PRIMARY       | 4       | wikidatawiki.wb_changes.change_revision_id | 1    |       |
|    1 | SIMPLE      | archive    | eq_ref | ar_revid_uniq | ar_revid_uniq | 4       | wikidatawiki.wb_changes.change_revision_id | 1    |       |
+------+-------------+------------+--------+---------------+---------------+---------+--------------------------------------------+------+-------+

Mh, but that makes things only more confusing.

There is more background information about this in the comments of T291086 and particularly T291086#7364424 seems to point out that this index is indeed still in use on "master, main traffic and api" hosts.

If that view is not the thing that uses that index, then what is?

Hm, strange indeed. @Marostegui how far back does the schema_unused_indexes data go? Is it possible the index only appeared to be used due to some old queries, or maybe some manual queries we ran?

(I tried to find some related queries in Tendril, but I’ve evidently forgotten how Tendril works, because I didn’t get anywhere.)

It gets cleaned up after each restart.

Alright, thanks. Should we wait until the databases have been restarted naturally (next DC switch?) and then check again if the index is still used? (The index would’ve been used regularly prior to T290367#7347326, if I’m not mistaken.)

I am restarting quite a bunch of s8 (wikidata) hosts as I am upgrading them.
Restarted db1101:3318, db1178 today so we can check by the end of the week if you want. (please ping me to do so)

Hm, but if I understand the comment at T291086#7364424 correctly, the index was already unused on those hosts?

Hm, but if I understand the comment at T291086#7364424 correctly, the index was already unused on those hosts?

On those yes, but if you check that, it is indeed used on _some_ of them. So there is (or was) something hitting that index at the time.
I just checked again and it is not being reported as unused on db1172.eqiad.wmnet (and that host has an uptime of 21 days, so it is pretty recent)

On those yes, but if you check that, it is indeed used on _some_ of them. So there is (or was) something hitting that index at the time.

Yes, but I thought we were mostly interested in those hosts where it was still being used (db1104, db1126, db1114), to see if it was still reported as used after a restart there.

I just checked again and it is not being reported as unused on db1172.eqiad.wmnet (and that host has an uptime of 21 days, so it is pretty recent)

Hm, IIUC we only stopped the old change dispatching system on October 5, so if db1172 was restarted 21 days ago (October 1st) it could still have seen some old queries, I think. (Though it’s strange that it did report the index as unused when you checked on September 20…)

We normally don't track the restarts, but db1171 was restarted 21 days ago, db1104 is way older (around 80 days) and db1126 two days ago, so that's the one we can double check whether the query arrives or not in a few weeks.
db1116 is a backup host so we shouldn't really use it for this analysis.

Change Dispatch Review 2021-10-27
-confirm that the index is not being used, determine if a schema change is needed, then drop the index (maybe)
-work falls into the standard process, so no longer needs to be included in the hike scope

ArthurTaylor renamed this task from Drop index on change_revision_id in wb_changes table to [C-DIS][SW] Drop index on change_revision_id in wb_changes table.Tue, Apr 2, 2:54 PM
ArthurTaylor moved this task from Incoming to [DOT] By Project on the wmde-wikidata-tech board.
Lydia_Pintscher subscribed.

Removing it from WD dev team board as this will need to be handled by WIT.