Drop change_tag.ct_tag column in production
Open, NormalPublic

Description

The day has finally come:

  1. ALTERs to run: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/473601/6/maintenance/archives/patch-drop-ct_tag.sql
  2. Where to run those changes: all.dblist
  3. When to run those changes: After deployment of wmf.8, It would be great if we apply it in small and medium wikis first
  4. If the schema change is backwards compatible: No.
  5. If the schema change has been tested already on some of the test/beta wikis: Tested in beta cluster
  6. if the data should be made available on the labs replicas and/or dumps: Yes, data in this table is public (unless the wiki is private which we don't replicate at all)

Execution progress:

s1

  • eqiad progress
  • codfw

s2

  • eqiad progress
  • codfw

s3

  • eqiad progress
  • codfw

s4

  • eqiad progress
  • codfw

s5

  • eqiad progress
  • codfw

s6

s7

  • eqiad progress
  • codfw

s8

  • eqiad progress
  • codfw

wikitech

  • eqiad
  • codfw
Ladsgroup triaged this task as Normal priority.

I will start with this after the code freeze.
I might alter an slave in enwiki earlier to see how this goes, as change_tag is a complicated table with lots of unknowns.

@Ladsgroup I think I will start with s6 (frwiki, jawiki and ruwiki). As the holidays time is approaching, I think I am going to alter only one slave in codfw, so we can at least see if there are any non expected writes arriving - if that is the case, replication will break on that host, but won't affect users as we don't read from codfw.
After holidays period I will do the same but on an eqiad host and monitor errors for a few days, to make sure there are no reads on that column.
How does that sound?

@Ladsgroup I think I will start with s6 (frwiki, jawiki and ruwiki). As the holidays time is approaching, I think I am going to alter only one slave in codfw, so we can at least see if there are any non expected writes arriving - if that is the case, replication will break on that host, but won't affect users as we don't read from codfw.
After holidays period I will do the same but on an eqiad host and monitor errors for a few days, to make sure there are no reads on that column.
How does that sound?

Sure. It's amazing. Just keep in mind there is some other changes there too (dropping related indexes, making one column not null). Please don't forget those. Thanks!

Yeah - I will run everything that is on that patch :)

Marostegui updated the task description. (Show Details)Wed, Jan 9, 9:13 PM
Marostegui updated the task description. (Show Details)Wed, Jan 9, 9:16 PM
Marostegui added a comment.EditedWed, Jan 9, 9:20 PM

As always, I will do the first schema change carefully, I will start with s6 hosts (only in codfw), to make sure we have no issues with replication. Once that is confirmed, the rest of sections, on codfw will be done with replication enabled

s6 progress:

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • dbstore1001
  • db2095
  • db2089
  • db2087
  • db2076 sanitarium master
  • db2067
  • db2060
  • db2053
  • db2046
  • db2039
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085 sanitarium master
  • db1061
Marostegui updated the task description. (Show Details)Wed, Jan 9, 9:20 PM

Mentioned in SAL (#wikimedia-operations) [2019-01-10T17:33:17Z] <marostegui> Deploy schema change on db2046 - T210713

This alter (ALTER TABLE /*_*/change_tag MODIFY ct_tag_id int unsigned NOT NULL, LOCK=NONE;) requires the following session change to be able to be done without locking the table:

MariaDB [test]> ALTER TABLE /*_*/change_tag MODIFY ct_tag_id int unsigned NOT NULL, LOCK=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: cannot silently convert NULL values, as required in this SQL_MODE. Try LOCK=SHARED.

MariaDB [test]> set  SQL_MODE=strict_trans_tables;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> ALTER TABLE /*_*/change_tag MODIFY ct_tag_id int unsigned NOT NULL, LOCK=NONE;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

This is similar to: T191316#4243784

Mentioned in SAL (#wikimedia-operations) [2019-01-10T17:49:23Z] <marostegui> Deploy schema change on db2053 - T210713

It's cut to half:

wikiadmin@10.64.32.114(ruwiki)> SELECT table_schema, table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_name = 'change_tag' order by (data_length + index_length) desc;
+--------------+------------+------------+
| table_schema | Table | Size in MB |
+--------------+------------+------------+
| frwiki | change_tag | 4978.98 |
| ruwiki | change_tag | 2599.02 |
| jawiki | change_tag | 2078.89 |
+--------------+------------+------------+
wikiadmin@db2046.codfw.wmnet(ruwiki)> SELECT table_schema, table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_name = 'change_tag' order by (data_length + index_length) desc;
+--------------+------------+------------+
| table_schema | Table | Size in MB |
+--------------+------------+------------+
| frwiki | change_tag | 2577.97 |
| ruwiki | change_tag | 1224.44 |
| jawiki | change_tag | 1012.16 |
+--------------+------------+------------+
3 rows in set (0.03 sec)

\o/

Keep in mind that this also triggered a table rebuilt, so the table got defragmented too

Change 483801 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2060

https://gerrit.wikimedia.org/r/483801

Mentioned in SAL (#wikimedia-operations) [2019-01-11T16:53:11Z] <marostegui> Defragment change_tag table on db2060 - T210713

Change 483801 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2060

https://gerrit.wikimedia.org/r/483801

Mentioned in SAL (#wikimedia-operations) [2019-01-11T16:55:02Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Depool db2060 T210713 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2019-01-11T17:10:40Z] <marostegui> Deploy schema change on db2060 - T210713

So I have done this test on db2060:
Current size:

root@db2060.codfw.wmnet[(none)]> SELECT table_schema, table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_name = 'change_tag' order by (data_length + index_length) desc;
+--------------+------------+------------+
| table_schema | Table      | Size in MB |
+--------------+------------+------------+
| frwiki       | change_tag |    3990.97 |
| ruwiki       | change_tag |    2663.09 |
| jawiki       | change_tag |    2174.09 |
+--------------+------------+------------+
3 rows in set (0.04 sec)

Defragmented the tables only, and these are the sizes:

root@db2060.codfw.wmnet[(none)]> SELECT table_schema, table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_name = 'change_tag' order by (data_length + index_length) desc;
+--------------+------------+------------+
| table_schema | Table      | Size in MB |
+--------------+------------+------------+
| frwiki       | change_tag |    3271.64 |
| ruwiki       | change_tag |    1730.66 |
| jawiki       | change_tag |    1410.23 |
+--------------+------------+------------+
3 rows in set (0.03 sec)

And after the schema change, sizes:

root@db2060.codfw.wmnet[(none)]> SELECT table_schema, table_name AS `Table`, round(((data_length + index_length) /4 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_name = 'change_tag' order by (data_length + index_length) desc;
+--------------+------------+------------+
| table_schema | Table      | Size in MB |
+--------------+------------+------------+
| frwiki       | change_tag |    2374.72 |
| ruwiki       | change_tag |    1224.34 |
| jawiki       | change_tag |     978.02 |
+--------------+------------+------------+
3 rows in set (0.04 sec)

Mentioned in SAL (#wikimedia-operations) [2019-01-11T18:07:51Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Repool db2060 T210713 (duration: 00m 46s)