Page MenuHomePhabricator

Upgrade all sanitarium masters to 10.4 and Buster
Closed, ResolvedPublic

Description

labsdb* hosts have been moved under 10.4 sanitarium hosts, so we can now migrate all sanitarium masters to Buster and 10.4

Hosts:

eqiad:

  • s1 db1106 - [x] tables checked - [x] tables checked after the upgrade
  • s2 db1074 (replace it with db1156 T258361 - [x] tables checked)
  • s3 db1112 - [x] tables checked - [x] tables checked after the upgrade
  • s4 db1121 - [x] tables checked (before the upgrade) - [x] tables checked after the upgrade
  • s5 db1082 (replace it with db1161 T258361 - [x] tables checked)
  • s6 db1085 (replace it with db1165 T258361 - [x] tables checked)
  • s7 db1079 (replace it with db1158 T258361 - [x] tables checked)
  • s8 db1087 (replace it with db1167 T258361 - [x] tables checked)

codfw:

  • s1 db2072 - [x] tables checked
  • s2 db2126 - [x] tables checked
  • s3 db2074 - [x] tables checked
  • s4 db2073 - [x] tables checked
  • s5 db2128 - [x] tables checked
  • s6 db2076 - [x] tables checked
  • s7 db2077 - [x] tables checked
  • s8 db2082 - [x] tables checked

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+1 -0
operations/puppetproduction+0 -0
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+1 -2
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+3 -1
operations/puppetproduction+0 -1
operations/puppetproduction+6 -6
operations/puppetproduction+4 -2
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -8
operations/puppetproduction+1 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
Resolved Marostegui
OpenNone
OpenNone
ResolvedRobH
Resolved Bstorm
Resolved Bstorm
Resolved Marostegui
Resolved Marostegui
StalledNone
ResolvedNone
Resolved Marostegui
ResolvedLegoktm
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
ResolvedRobH
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
ResolvedRequestwiki_willy
ResolvedRequest Cmjohnson
ResolvedRequest Cmjohnson
ResolvedRequest Cmjohnson
ResolvedRequest Cmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Excellent, thanks. It will take around a day I'd guess.

Change 682682 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications on db1102 after maintenance

https://gerrit.wikimedia.org/r/682682

Change 682682 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications on db1102 after maintenance

https://gerrit.wikimedia.org/r/682682

Excellent, thanks. It will take around a day I'd guess.

It finished at ~3am: all yours. Please note I loaded grants and events to the best of my ability, but please double check those, as I saw the grant files are very outdated (no tendril grants, prometheus and icinga don't use socket authentication, etc.) so I fixed to the best of my ability.

Change 682668 merged by Marostegui:

[operations/puppet@production] mariadb: Reenable notifications on db1156 after maintenance

https://gerrit.wikimedia.org/r/682668

Change 683483 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/683483

Change 683483 merged by Marostegui:

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/683483

Change 684667 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1121: Enable notifications

https://gerrit.wikimedia.org/r/684667

Change 684667 merged by Marostegui:

[operations/puppet@production] db1121: Enable notifications

https://gerrit.wikimedia.org/r/684667

Mentioned in SAL (#wikimedia-operations) [2021-05-04T07:11:46Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json

s5 sanitarium master switched: db1154 now replicates from db1161 (10.4)

Mentioned in SAL (#wikimedia-operations) [2021-05-04T08:02:07Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-05-04T08:02:58Z] <marostegui> Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)

Change 684795 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/684795

Change 684795 merged by Marostegui:

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/684795

Mentioned in SAL (#wikimedia-operations) [2021-05-05T06:40:59Z] <marostegui> Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492

Mentioned in SAL (#wikimedia-operations) [2021-05-05T06:42:05Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json

s2 sanitarium master db1074 has been replaced by db1156

s7 sanitarium master db1079 has been replaced by db1158

I am going to remove the db2098 s3 10.1 instance, now that db2139 has been working fine for a while. A last backup of the old instance will be available on dbprov2002 until it is no longer recoverable (and we can always recover from logical backups).

Change 685717 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: remove db2098 s3 section for this codfw backup source

https://gerrit.wikimedia.org/r/685717

Change 685717 merged by Jcrespo:

[operations/puppet@production] dbbackups: remove db2098 s3 section for this codfw backup source

https://gerrit.wikimedia.org/r/685717

db2098 s3 should be gone now, and will be soon gone from grafana/prometheus.

s8 sanitarium master db1087 has been replaced by db1167

Change 687965 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db1121 to Buster

https://gerrit.wikimedia.org/r/687965

Change 687965 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db1121 to Buster

https://gerrit.wikimedia.org/r/687965

Change 688742 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/688742

Mentioned in SAL (#wikimedia-operations) [2021-05-11T05:11:03Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster T280492', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-05-11T05:11:41Z] <marostegui> Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas T280492

Change 688742 merged by Marostegui:

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/688742

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1121.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105110519_marostegui_32061.log.

Completed auto-reimage of hosts:

['db1121.eqiad.wmnet']

and were ALL successful.

db1121 has been reimaged to Buster.
I am checking the tables now, this means commonswiki will show lag on wikireplicas.

db1121 is clean, replication restarted.

db1121 is being automatically repooled

Change 689523 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Switch db1112 to buster

https://gerrit.wikimedia.org/r/689523

Change 689523 merged by Marostegui:

[operations/puppet@production] install_server: Switch db1112 to buster

https://gerrit.wikimedia.org/r/689523

Mentioned in SAL (#wikimedia-operations) [2021-05-17T04:35:52Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1112 T280492', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1112.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105170439_marostegui_29261.log.

Completed auto-reimage of hosts:

['db1112.eqiad.wmnet']

and were ALL successful.

Change 692111 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1112: Disable notifications

https://gerrit.wikimedia.org/r/692111

Change 692111 merged by Marostegui:

[operations/puppet@production] db1112: Disable notifications

https://gerrit.wikimedia.org/r/692111

db1112 reimaged to Buster, - checking tables now.

db1112 reimaged to Buster, - checking tables now.

db1112 got all the tables checked, everything is clean. Replication started.

Change 692348 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db1106 to Buster

https://gerrit.wikimedia.org/r/692348

Change 692348 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db1106 to Buster

https://gerrit.wikimedia.org/r/692348

Change 681448 merged by Jcrespo:

[operations/puppet@production] mariadb: Remove s3 from db2098

https://gerrit.wikimedia.org/r/681448

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1106.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105180512_marostegui_20717.log.

Change 692470 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/692470

Change 692470 merged by Marostegui:

[operations/puppet@production] db1106: Disable notifications

https://gerrit.wikimedia.org/r/692470

Completed auto-reimage of hosts:

['db1106.eqiad.wmnet']

and were ALL successful.

db1106 reimaged. Checking its tables.

db1106 came back clean, replication restarted

Mentioned in SAL (#wikimedia-operations) [2021-05-19T06:43:44Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1106 T280492', diff saved to https://phabricator.wikimedia.org/P16082 and previous config saved to /var/cache/conftool/dbconfig/20210519-064343-marostegui.json

Marostegui updated the task description. (Show Details)

All sanitarium masters are now running Buster+10.4