Page MenuHomePhabricator

Migrate codfw sanitarium hosts (db2094/db2095) to Buster and 10.4
Closed, ResolvedPublic

Description

Once we've disconnected the old labsdb* hosts from the old sanitariums in eqiad (db1124/db1125) we should go ahead and migrate db2094/db2095 in codfw to Buster and 10.4
db1124/db1125 will not need to be migrated as they'll be repurposed somewhere else (T258361)

Event Timeline

Marostegui changed the task status from Open to Stalled.Feb 18 2021, 9:10 AM
Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to Blocked on the DBA board.

This can happen after 15th April

Marostegui changed the task status from Stalled to Open.Apr 15 2021, 5:54 AM
Marostegui moved this task from Blocked to Ready on the DBA board.

Change 680162 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db2094,db2095 to buster

https://gerrit.wikimedia.org/r/680162

Change 680162 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db2094,db2095 to buster

https://gerrit.wikimedia.org/r/680162

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2094.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104160534_marostegui_9954.log.

Completed auto-reimage of hosts:

['db2094.codfw.wmnet']

and were ALL successful.

db2094 has been migrated to Buster. I am now checking all the tables to look for corruption, as we saw some during while setting up the buster sanitarium hosts in eqiad.

Change 680166 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2094: Disable notifications

https://gerrit.wikimedia.org/r/680166

Change 680166 merged by Marostegui:

[operations/puppet@production] db2094: Disable notifications

https://gerrit.wikimedia.org/r/680166

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2095.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104160606_marostegui_18089.log.

Change 680172 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2095: Disable notifications

https://gerrit.wikimedia.org/r/680172

Change 680172 merged by Marostegui:

[operations/puppet@production] db2095: Disable notifications

https://gerrit.wikimedia.org/r/680172

Completed auto-reimage of hosts:

['db2095.codfw.wmnet']

and were ALL successful.

db2095 has been migrated to Buster. I am now checking all the tables to look for corruption, as we saw some during while setting up the buster sanitarium hosts in eqiad.

There's some corruption on both, db2094 and db2095, so once all the checks are finished, I will need to rebuild the affected tables.

I have started to rebuild all tables that reported errors across all the sections on db2094 and db2095

Only s4 and s8 are still being fixed. The rest of sections are done and replication is restarted