Page MenuHomePhabricator

Upgrade m1 to Buster and Mariadb 10.4
Closed, ResolvedPublic

Description

I want to have m1 upgraded to Buster and Mariadb 10.4

m1 has:

  • db1135 master
  • Provision db1097 as future master.
  • db1117 slave (across all miscs)
  • db2132
  • db2078 (already migrated)

The idea is to grab one of the hosts at T253217 and migrate one of these as they have smaller disk (but still enough for misc) and keep the current master as a rollback plan just in case.
Once migrated, move the "old" m1 master to the list of pooleables hosts and move those to sXX as they have more disk space (4.4T vs 3.2T)

Event Timeline

Marostegui triaged this task as Medium priority.Jun 5 2020, 8:40 AM
Marostegui moved this task from Triage to Pending comment on the DBA board.

Change 603871 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1097 to m1

https://gerrit.wikimedia.org/r/603871

Change 603871 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1097 to m1

https://gerrit.wikimedia.org/r/603871

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1097.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006090722_marostegui_243699.log.

Completed auto-reimage of hosts:

['db1097.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2020-06-09T08:01:43Z] <marostegui> stop m1 on db1117 to clone db1097 (this will trigger an haproxy irc alert) - T254556

Change 603900 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1097: Enable notifications

https://gerrit.wikimedia.org/r/603900

Change 603900 merged by Marostegui:
[operations/puppet@production] db1097: Enable notifications

https://gerrit.wikimedia.org/r/603900

Change 603978 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Reimage db2132 with buster

https://gerrit.wikimedia.org/r/603978

Change 603978 merged by Marostegui:
[operations/puppet@production] install_server: Reimage db2132 with buster

https://gerrit.wikimedia.org/r/603978

Change 606558 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2132: Disable notifications

https://gerrit.wikimedia.org/r/606558

Change 606558 merged by Marostegui:
[operations/puppet@production] db2132: Disable notifications

https://gerrit.wikimedia.org/r/606558

Mentioned in SAL (#wikimedia-operations) [2020-06-19T06:19:59Z] <marostegui> Stop mysql on db2132 to reimage m1 codfw master - T254556

Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts:

['db2132.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006190620_marostegui_16910.log.

Completed auto-reimage of hosts:

['db2132.codfw.wmnet']

and were ALL successful.

Change 606634 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2132: Enable notifications

https://gerrit.wikimedia.org/r/606634

Change 606634 merged by Marostegui:
[operations/puppet@production] db2132: Enable notifications

https://gerrit.wikimedia.org/r/606634

Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts:

['db1117.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006220719_marostegui_6426.log.

Completed auto-reimage of hosts:

['db1117.eqiad.wmnet']

and were ALL successful.

Change 606950 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1117: Enable notifications

https://gerrit.wikimedia.org/r/606950

Change 606950 merged by Marostegui:
[operations/puppet@production] db1117: Enable notifications

https://gerrit.wikimedia.org/r/606950

@jcrespo @akosiaris @ayounsi I would like to switchover the master to the new master that runs Buster and MariaDB 10.4.
m1 holds:

bacula
librenms (which in previous switchovers has shown that no action is required)
etherpad (which we probably need to restart, as done in previous reimages).

The switchover shouldn't take more than a few seconds as we just need to run the script to change replication and reload the proxies.

@jcrespo @akosiaris - I would like to do this maybe Thursday 25th at 08:00 AM UTC. Is that ok? Let me know if you prefer any other day/time

@jcrespo @akosiaris - I would like to do this maybe Thursday 25th at 08:00 AM UTC. Is that ok? Let me know if you prefer any other day/time

Sure, fine by me.

Thank you, I have sent a calendar invite to you and to @jcrespo

Change 606953 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Promote db1097 to m1 master

https://gerrit.wikimedia.org/r/606953

Adding the User-notice tag as Etherpad will be on read-only for a few seconds Thursday 25th at 08:00 AM UTC
Will email wikitech-l shortly.

Change 607249 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy1012,1014: Place db1097 as standby host.

https://gerrit.wikimedia.org/r/607249

Change 607249 merged by Marostegui:
[operations/puppet@production] dbproxy1012,1014: Place db1097 as standby host.

https://gerrit.wikimedia.org/r/607249

Mentioned in SAL (#wikimedia-operations) [2020-06-23T09:32:17Z] <marostegui> Reload haproxy on dbproxy1012 and dbproxy1014 to test db1097 as secondary for 24h T254556

Johan subscribed.

Since next Tech News won't go out until Monday anyway (and I don't think we need to be too concerned about a few seconds of Etherpad read-only) – do re-instate if the update for some reason goes wrong and there are lingering problems the communities should know about.

Mentioned in SAL (#wikimedia-operations) [2020-06-25T07:08:50Z] <marostegui> Start pre switchover steps on m1 T254556

Change 606953 merged by Marostegui:
[operations/puppet@production] mariadb: Promote db1097 to m1 master

https://gerrit.wikimedia.org/r/606953

Mentioned in SAL (#wikimedia-operations) [2020-06-25T07:52:08Z] <jynus> stop bacula-director on backup1001 for db maintenance T254556

Mentioned in SAL (#wikimedia-operations) [2020-06-25T08:03:05Z] <marostegui> Failover m1 from db1135 to db1097 - T254556

Change 607725 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software/wmfmariadbpy@master] switchover.py: Change zarcillo instance

https://gerrit.wikimedia.org/r/607725

Change 607725 merged by Marostegui:
[operations/software/wmfmariadbpy@master] switchover.py: Change zarcillo instance

https://gerrit.wikimedia.org/r/607725

This is done.
I am going to leave db1135 replicating for 24h (so we can also see if basic 10.4 -> 10.1 replication works) and then I will move db1135 somewhere else.