Page MenuHomePhabricator

Check for compressed templatelinks tables
Closed, ResolvedPublic

Description

Following the events at T301313 we should check and rebuild templatelinks tables on s3 (and the rest of pending sections) before they are attempted to get altered.

The following hosts need to be checked:

s1:

  • all clean

s2:

  • db1129.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db1162.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db2088.codfw.wmnet:3306 (needs rebuild) (cleaned)
  • db2104.codfw.wmnet:3306 (needs rebuild) (cleaned)
  • db2126.codfw.wmnet:3306 (needs rebuild) (cleaned)

s3:

  • dbstore1007:3313 (clean)
  • db1112.eqiad.wmnet:3306 (needs rebuild) (cleaned) if time allows, follow up: T301848#7716449
  • db1102.eqiad.wmnet:3313 (needs rebuild) (cleaned)
  • clouddb1021.eqiad.wmnet:3313 (clean)
  • clouddb1017.eqiad.wmnet:3313 (clean)
  • clouddb1013.eqiad.wmnet:3313 (clean)
  • db1179.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db1175.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db1166.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db1157.eqiad.wmnet:3306 (master) (cleaned)
  • db1154.eqiad.wmnet:3313 (clean)

s4:

  • all clean

s7:

  • db1136.eqiad.wmnet:3306 ( master - needs rebuild) (cleaned)
  • db1158.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db1174.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db1181.eqiad.wmnet:3306 (needs rebuild) (cleaned)
  • db2077.codfw.wmnet:3306 (needs rebuild) (cleaned)

s8:

  • all clean

s5 and s6 were done already.

db2074, db2094 and db1123 are done

Event Timeline

Marostegui triaged this task as High priority.
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)

db1112 needs to get the schema change with table rebuild, db1154 (sanitarium - db1112's slave is clean and so are clouddb* replicas)

db1102 is running the schema change with ALGORITHM=COPY and it is progressing nicely and converting the table as they get altered.

The master needs to be switched over, as it has lots of tables with Compact format and of course we cannot rebuild them while the host is acting as master.

Change 763204 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2088: Disable notifications

https://gerrit.wikimedia.org/r/763204

Change 763204 merged by Marostegui:

[operations/puppet@production] db2088: Disable notifications

https://gerrit.wikimedia.org/r/763204

I am deploying T300775 on db1179, db1175, db1166, db1112 with table rebuilt so that will get cleaned up (on s3)

Change 763280 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2126,db2095: Disable notifications

https://gerrit.wikimedia.org/r/763280

Going to clean the other s2 codfw hosts, so we don't have to run the rebuild on codfw.

Change 763280 merged by Marostegui:

[operations/puppet@production] db2126,db2095: Disable notifications

https://gerrit.wikimedia.org/r/763280

db2126 has been cleaned it shouldn't crash anymore when deploying the schema change. Pending s2 master, so we can deploy there with replication and no crashes.

db1112 is fully clean too, only a few templatelinks pages from closed wikis are pending (which won't be altered anyways). I might clean those up too once all this has gotten under control

I am now rebuilding the last three s3 slaves: db1179, db1175, db1166.

Rebuilding db2104 (s2 codfw master), there'll be lag on s2 codfw while this happens

Mentioned in SAL (#wikimedia-operations) [2022-02-21T10:01:01Z] <marostegui> Rebuild templatelinks table on s2 codfw master (db2104), lag to be expected on codfw T301848

Mentioned in SAL (#wikimedia-operations) [2022-02-21T11:57:51Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1129 T301848', diff saved to https://phabricator.wikimedia.org/P21133 and previous config saved to /var/cache/conftool/dbconfig/20220221-115750-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2022-02-21T11:58:11Z] <marostegui> Rebuild templatelinks table on db1129 (s2) T301848

Mentioned in SAL (#wikimedia-operations) [2022-02-21T12:36:52Z] <marostegui> Rebuild templatelinks table on db2077 (s7) T301848

Fixing db2077 (its slave, db2095 is fine)

All replicas are now fixed. Only pending s3 and s7 masters which will be done once we've switched them as part of the upgrades.

Mentioned in SAL (#wikimedia-operations) [2022-03-29T06:11:29Z] <marostegui> Maintenance on db1157 (old s3 master) T301848

Running this on old s3 master (db1157)

Marostegui updated the task description. (Show Details)

db1136, old s7 master, done. So everything is clean.