Page MenuHomePhabricator

Schema change for adding indexes of ct_tag_id
Closed, ResolvedPublic

Description

After T193874: Add new indexes to change_tag table using ct_tag_id instead of ct_tag It would be great to apply this change

From the checklist at https://wikitech.wikimedia.org/wiki/Schema_changes:

  1. ALTERs to run: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/456027/11/maintenance/archives/patch-change_tag-change_tag_rc_tag_id.sql (with this small fix)
  2. Where to run those changes: all.dblist
  3. When to run those changes: ?
  4. If the schema change is backwards compatible: Yes.
  5. If the schema change has been tested already on some of the test/beta wikis: Tested in beta cluster
  6. if the data should be made available on the labs replicas and/or dumps: Yes, data in this table is public (unless the wiki is private which we don't replicate at all)

Progress:

s1

s2

  • codfw
  • eqiad

s3

  • codfw
  • eqiad

s4

  • codfw
  • eqiad

s5

s6

  • codfw
  • eqiad

s7

  • codfw
  • eqiad

s8

wikitech

  • labsdb
  • labtestwiki

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I am planning to deploy this first on an active enwiki (codfw) replica, and leave it for a couple of days to make sure nothing gets weird with its reads or writes, before deploying it to eqiad.

Mentioned in SAL (#wikimedia-operations) [2018-09-20T06:09:37Z] <marostegui> Deploy schema change on db2070 - T203709

I have altered db2070 (enwiki) and I will leave it like that for a few days before going ahead on eqiad directly on the master.

s1 progress:

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2002
  • dbstore1002
  • dbstore1001
  • db2094
  • db2092
  • db2088
  • db2085
  • db2072
  • db2071
  • db2070
  • db2062
  • db2055
  • db2048
  • db1124
  • db1119
  • db1118 (replication broken - test host)
  • db1114
  • db1106
  • db1105
  • db1099
  • db1089
  • db1083
  • db1080
  • db1067

I have altered db2070 (enwiki) and I will leave it like that for a few days before going ahead on eqiad directly on the master.

I have not seen any issues on this hosts. The graphs looks good. Nothing seems to have changed on its query read patterns. So on Monday I will deploy this change to an API and a Recentchanges host (db2070 is main traffic only) on enwiki too to see if there is any impact and if not, I will start getting this deployed on all eqiad.

Mentioned in SAL (#wikimedia-operations) [2018-09-24T05:28:41Z] <marostegui> Deploy schema change on db2088:3311 - T203709

I have altered db2070 (enwiki) and I will leave it like that for a few days before going ahead on eqiad directly on the master.

I have not seen any issues on this hosts. The graphs looks good. Nothing seems to have changed on its query read patterns. So on Monday I will deploy this change to an API and a Recentchanges host (db2070 is main traffic only) on enwiki too to see if there is any impact and if not, I will start getting this deployed on all eqiad.

Altered db2088:3311 (recentchanges) and db2062 (API). Let's leave it for a day and see if there is any changes in query pattern before going ahead in eqiad.

Everything looks fine with db2088, db2062 and db2070 query patterns.
I am going to deploy this change on s5 codfw first (leaving eqiad aside) to make sure nothing breaks on eqiad (replication wise) as this is the situation we'll have once we are back on eqiad as active: the alter will be done on the primary master (eqiad) but not on the DC master (codfw).

s5 progress:

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2094
  • db2089
  • db2084
  • db2075
  • db2066
  • db2059
  • db2052
  • db2038
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070

I am going to deploy this change on s5 codfw first (leaving eqiad aside) to make sure nothing breaks on eqiad (replication wise) as this is the situation we'll have once we are back on eqiad as active: the alter will be done on the primary master (eqiad) but not on the DC master (codfw).

This is now done on s5, I am going to leave it running for a few hours to make sure nothing weird, replication-wise, happens on eqiad.

So far so good having s5 fully codfw done with the schema change and not eqiad. I will wait a couple of more hours before deploying it on eqiad.

Mentioned in SAL (#wikimedia-operations) [2018-09-25T13:07:07Z] <marostegui> Deploy schema change on s5 eqiad - this might generate lag on s5 eqiad - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-25T13:13:26Z] <marostegui> Deploy schema change on s6 eqiad - this might generate lag on s6 eqiad - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-25T13:29:51Z] <marostegui> Deploy schema change on s2 eqiad - this might generate lag on s2 eqiad - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-26T05:20:11Z] <marostegui> Deploy schema change on s8 eqiad, this will generate lag - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-26T09:06:48Z] <marostegui> Deploy schema change on db2092 - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-26T12:33:14Z] <marostegui> Deploy schema change on s4 eqiad, will generate lag - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-26T13:12:34Z] <marostegui> Deploy schema change on db2071 - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-27T05:41:44Z] <marostegui> Deploy schema change on s1 eqiad master, lag will be generated - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-27T08:39:32Z] <marostegui> Deploy schema change on labtestwiki - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-27T08:48:49Z] <marostegui> Deploy schema change on labswiki (db1073 master) - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-28T05:54:41Z] <marostegui> Deploy schema change on s7 eqiad, this will generate lag - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-28T06:07:20Z] <marostegui> Deploy schema change on db2055 - T203709

Mentioned in SAL (#wikimedia-operations) [2018-09-28T08:11:10Z] <marostegui> Deploy schema change on s3 eqiad, this will generate lag - T203709

This is all done in eqiad. Once we are back in eqiad this will be deployed on codfw.
s1 codfw is done, with the exception of db2048 (master) which I will alter on Monday.

Mentioned in SAL (#wikimedia-operations) [2018-10-01T05:07:23Z] <marostegui> Deploy schema change on s1 codfw msater - T203709

db2048 (s1 codfw master) done.
Waiting now for eqiad to be back to finish off this.

Mentioned in SAL (#wikimedia-operations) [2018-10-02T10:24:53Z] <marostegui> Deploy schema change on db2092 - T203709

It's not done in s8 eqiad causing this T207313: Some administrative and log actions on Wikidata take longer than 60 seconds and time out on Wikidata:

wikiadmin@10.64.48.172(wikidatawiki)> show index from change_tag;
+------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table      | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| change_tag |          0 | PRIMARY              |            1 | ct_id       | A         |   222003497 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          0 | change_tag_rc_tag    |            1 | ct_rc_id    | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_rc_tag    |            2 | ct_tag      | A         |   222003497 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          0 | change_tag_log_tag   |            1 | ct_log_id   | A         |        1060 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_log_tag   |            2 | ct_tag      | A         |       41210 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          0 | change_tag_rev_tag   |            1 | ct_rev_id   | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_rev_tag   |            2 | ct_tag      | A         |   222003497 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id    |            1 | ct_tag      | A         |       30971 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id    |            2 | ct_rc_id    | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id    |            3 | ct_rev_id   | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id    |            4 | ct_log_id   | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id |            1 | ct_tag_id   | A         |       33591 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id |            2 | ct_rc_id    | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id |            3 | ct_rev_id   | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id |            4 | ct_log_id   | A         |   222003497 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
15 rows in set (0.00 sec)

(Compare it with enwiki):

wikiadmin@10.64.32.115(enwiki)> show index from change_tag;
+------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table      | Non_unique | Key_name                   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| change_tag |          0 | PRIMARY                    |            1 | ct_id       | A         |    56055095 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          0 | change_tag_rc_tag_id       |            1 | ct_rc_id    | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_rc_tag_id       |            2 | ct_tag_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_log_tag_id      |            1 | ct_log_id   | A         |     2242203 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_log_tag_id      |            2 | ct_tag_id   | A         |       50363 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_rev_tag_id      |            1 | ct_rev_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          0 | change_tag_rev_tag_id      |            2 | ct_tag_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id          |            1 | ct_tag      | A         |       36423 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id          |            2 | ct_rc_id    | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id          |            3 | ct_rev_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id          |            4 | ct_log_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id       |            1 | ct_tag_id   | A         |       44629 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id       |            2 | ct_rc_id    | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id       |            3 | ct_rev_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_tag_id_id       |            4 | ct_log_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_rc_tag_nonuniq  |            1 | ct_rc_id    | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_rc_tag_nonuniq  |            2 | ct_tag      | A         |    56055095 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          1 | change_tag_log_tag_nonuniq |            1 | ct_log_id   | A         |      849319 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_log_tag_nonuniq |            2 | ct_tag      | A         |       39447 |     NULL | NULL   |      | BTREE      |         |               |
| change_tag |          1 | change_tag_rev_tag_nonuniq |            1 | ct_rev_id   | A         |    56055095 |     NULL | NULL   | YES  | BTREE      |         |               |
| change_tag |          1 | change_tag_rev_tag_nonuniq |            2 | ct_tag      | A         |    56055095 |     NULL | NULL   |      | BTREE      |         |               |
+------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
21 rows in set (0.00 sec)

FYI. That wasted two days of mine trying to understand what is wrong with the query and trying to find a better query for that

@Marostegui Do we have an ETA on these indexes being re added so that we can resolve T208846 ?
I see your away, so I'll ping otherson IRC.

@Marostegui Do we have an ETA on these indexes being re added so that we can resolve T208846 ?
I see your away, so I'll ping otherson IRC.

I am back from holidays today. I will start working on this, hopefully this week once I have caught up

This is very weird, I am _pretty_ sure I did s8 eqiad whilst codfw was active.
The fact that this exists on the master, the sanitarium master, sanitarium, and labs confuses me, not sure it this might have been reverted while we were working on the s8 incident (data loss and some tables reconstruction)

s8 progress:

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2094
  • db2086
  • db2085
  • db2083
  • db2082
  • db2081
  • db2080
  • db2079
  • db2045
  • db1124
  • db1116
  • db1109
  • db1104
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071

Change 472944 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1099:3318

https://gerrit.wikimedia.org/r/472944

Change 472944 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1099:3318

https://gerrit.wikimedia.org/r/472944

Mentioned in SAL (#wikimedia-operations) [2018-11-12T08:53:46Z] <marostegui> Deploy schema change on db1099:3318 - T203709

Change 472955 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1101:3318

https://gerrit.wikimedia.org/r/472955

Change 472955 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1101:3318

https://gerrit.wikimedia.org/r/472955

Mentioned in SAL (#wikimedia-operations) [2018-11-12T13:59:00Z] <marostegui> Deploy schema change on db1101:3318 - T203709

I know realised why the change was "gone" from all the eqiad s8 hosts except db1071, db1087,db1124 and labs. The reason is that the rest of the hosts were recloned from codfw (where the change wasn't deployed) as part of T206743: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared")

Could you check the list of schema changes and maintenance to be ran during switchover to test if they where also undone?

Could you check the list of schema changes and maintenance to be ran during switchover to test if they where also undone?

Yep, I was doing it that now :-)

Change 473152 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1109

https://gerrit.wikimedia.org/r/473152

Change 473152 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1109

https://gerrit.wikimedia.org/r/473152

Mentioned in SAL (#wikimedia-operations) [2018-11-13T05:34:53Z] <marostegui> Deploy schema change on db1109 T203709

Change 473158 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/473158

Change 473158 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/473158

Mentioned in SAL (#wikimedia-operations) [2018-11-13T07:21:15Z] <marostegui> Deploy schema change on db1104 T203709

Change 473165 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/473165

Change 473165 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/473165

Mentioned in SAL (#wikimedia-operations) [2018-11-13T09:23:49Z] <marostegui> Deploy schema change on db1092 T203709

Mentioned in SAL (#wikimedia-operations) [2018-11-13T10:47:11Z] <marostegui> Deploy schema change on db1116:3318 T203709

@Ladsgroup All done in s8 eqiad core hosts (T203709#4738599). You shouldn't be seeing what you were seeing before

Mentioned in SAL (#wikimedia-operations) [2018-11-13T13:41:14Z] <marostegui> Deploy schema change on s8 codfw master (db2045) this will generate lag on s8 codfw - T203709

Mentioned in SAL (#wikimedia-operations) [2018-11-14T06:40:26Z] <marostegui> Deploy schema change on s6 codfw master, this will generate lag on s6 codfw -T203709

Mentioned in SAL (#wikimedia-operations) [2018-11-14T06:52:35Z] <marostegui> Deploy schema change on s4 codfw master, this will generate lag on s4 codfw - T203709

Mentioned in SAL (#wikimedia-operations) [2018-11-14T07:07:26Z] <marostegui> Deploy schema change on s2 codfw master, this will generate lag on s2 codfw - T203709

Mentioned in SAL (#wikimedia-operations) [2018-11-14T07:18:59Z] <marostegui> Deploy schema change on s7 codfw master, this will generate lag on s7 codfw - T203709

Mentioned in SAL (#wikimedia-operations) [2018-11-14T07:42:45Z] <marostegui> Deploy schema change on s3 codfw master, this will generate lag on s3 codfw - T203709

Marostegui updated the task description. (Show Details)

This is all done