Page MenuHomePhabricator

Upgrade all sanitarium masters to 10.4 and Buster
Closed, ResolvedPublic

Description

labsdb* hosts have been moved under 10.4 sanitarium hosts, so we can now migrate all sanitarium masters to Buster and 10.4

Hosts:

eqiad:

  • s1 db1106 - [x] tables checked - [x] tables checked after the upgrade
  • s2 db1074 (replace it with db1156 T258361 - [x] tables checked)
  • s3 db1112 - [x] tables checked - [x] tables checked after the upgrade
  • s4 db1121 - [x] tables checked (before the upgrade) - [x] tables checked after the upgrade
  • s5 db1082 (replace it with db1161 T258361 - [x] tables checked)
  • s6 db1085 (replace it with db1165 T258361 - [x] tables checked)
  • s7 db1079 (replace it with db1158 T258361 - [x] tables checked)
  • s8 db1087 (replace it with db1167 T258361 - [x] tables checked)

codfw:

  • s1 db2072 - [x] tables checked
  • s2 db2126 - [x] tables checked
  • s3 db2074 - [x] tables checked
  • s4 db2073 - [x] tables checked
  • s5 db2128 - [x] tables checked
  • s6 db2076 - [x] tables checked
  • s7 db2077 - [x] tables checked
  • s8 db2082 - [x] tables checked

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+1 -0
operations/puppetproduction+0 -0
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+1 -2
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+3 -1
operations/puppetproduction+0 -1
operations/puppetproduction+6 -6
operations/puppetproduction+4 -2
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -8
operations/puppetproduction+1 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedMarostegui
OpenNone
OpenNone
ResolvedRobH
ResolvedBstorm
ResolvedBstorm
ResolvedMarostegui
ResolvedMarostegui
StalledNone
OpenNone
ResolvedMarostegui
ResolvedLegoktm
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedRobH
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedRequestwiki_willy
ResolvedRequestCmjohnson
ResolvedRequestCmjohnson
ResolvedRequestCmjohnson
ResolvedRequestCmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Excellent, thanks. It will take around a day I'd guess.

Change 682682 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications on db1102 after maintenance

https://gerrit.wikimedia.org/r/682682

Change 682682 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications on db1102 after maintenance

https://gerrit.wikimedia.org/r/682682

Excellent, thanks. It will take around a day I'd guess.

It finished at ~3am: all yours. Please note I loaded grants and events to the best of my ability, but please double check those, as I saw the grant files are very outdated (no tendril grants, prometheus and icinga don't use socket authentication, etc.) so I fixed to the best of my ability.

Thank you, I will take over it!

Change 682668 merged by Marostegui:

[operations/puppet@production] mariadb: Reenable notifications on db1156 after maintenance

https://gerrit.wikimedia.org/r/682668

Change 683483 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/683483

Change 683483 merged by Marostegui:

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/683483

Change 684667 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1121: Enable notifications

https://gerrit.wikimedia.org/r/684667

Change 684667 merged by Marostegui:

[operations/puppet@production] db1121: Enable notifications

https://gerrit.wikimedia.org/r/684667

Mentioned in SAL (#wikimedia-operations) [2021-05-04T07:11:46Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json

s5 sanitarium master switched: db1154 now replicates from db1161 (10.4)

Mentioned in SAL (#wikimedia-operations) [2021-05-04T08:02:07Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-05-04T08:02:58Z] <marostegui> Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)

Change 684795 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/684795

Change 684795 merged by Marostegui:

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/684795

Mentioned in SAL (#wikimedia-operations) [2021-05-05T06:40:59Z] <marostegui> Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492

Mentioned in SAL (#wikimedia-operations) [2021-05-05T06:42:05Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json

s2 sanitarium master db1074 has been replaced by db1156

s7 sanitarium master db1079 has been replaced by db1158

I am going to remove the db2098 s3 10.1 instance, now that db2139 has been working fine for a while. A last backup of the old instance will be available on dbprov2002 until it is no longer recoverable (and we can always recover from logical backups).

Change 685717 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: remove db2098 s3 section for this codfw backup source

https://gerrit.wikimedia.org/r/685717

Change 685717 merged by Jcrespo:

[operations/puppet@production] dbbackups: remove db2098 s3 section for this codfw backup source

https://gerrit.wikimedia.org/r/685717

db2098 s3 should be gone now, and will be soon gone from grafana/prometheus.

s8 sanitarium master db1087 has been replaced by db1167

Change 687965 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db1121 to Buster

https://gerrit.wikimedia.org/r/687965

Change 687965 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db1121 to Buster

https://gerrit.wikimedia.org/r/687965

Change 688742 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/688742

Mentioned in SAL (#wikimedia-operations) [2021-05-11T05:11:03Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster T280492', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-05-11T05:11:41Z] <marostegui> Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas T280492

Change 688742 merged by Marostegui:

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/688742

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1121.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105110519_marostegui_32061.log.

Completed auto-reimage of hosts:

['db1121.eqiad.wmnet']

and were ALL successful.

db1121 has been reimaged to Buster.
I am checking the tables now, this means commonswiki will show lag on wikireplicas.

db1121 is clean, replication restarted.

db1121 is being automatically repooled

Change 689523 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Switch db1112 to buster

https://gerrit.wikimedia.org/r/689523

Change 689523 merged by Marostegui:

[operations/puppet@production] install_server: Switch db1112 to buster

https://gerrit.wikimedia.org/r/689523

Mentioned in SAL (#wikimedia-operations) [2021-05-17T04:35:52Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1112 T280492', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1112.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105170439_marostegui_29261.log.

Completed auto-reimage of hosts:

['db1112.eqiad.wmnet']

and were ALL successful.

Change 692111 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1112: Disable notifications

https://gerrit.wikimedia.org/r/692111

Change 692111 merged by Marostegui:

[operations/puppet@production] db1112: Disable notifications

https://gerrit.wikimedia.org/r/692111

db1112 reimaged to Buster, - checking tables now.

db1112 reimaged to Buster, - checking tables now.

db1112 got all the tables checked, everything is clean. Replication started.

Change 692348 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db1106 to Buster

https://gerrit.wikimedia.org/r/692348

Change 692348 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db1106 to Buster

https://gerrit.wikimedia.org/r/692348

Change 681448 merged by Jcrespo:

[operations/puppet@production] mariadb: Remove s3 from db2098

https://gerrit.wikimedia.org/r/681448

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1106.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105180512_marostegui_20717.log.

Change 692470 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/692470

Change 692470 merged by Marostegui:

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/692470

Completed auto-reimage of hosts:

['db1106.eqiad.wmnet']

and were ALL successful.

db1106 reimaged. Checking its tables.

db1106 came back clean, replication restarted

Mentioned in SAL (#wikimedia-operations) [2021-05-19T06:43:44Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1106 T280492', diff saved to https://phabricator.wikimedia.org/P16082 and previous config saved to /var/cache/conftool/dbconfig/20210519-064343-marostegui.json

Marostegui updated the task description. (Show Details)

All sanitarium masters are now running Buster+10.4