Page MenuHomePhabricator

Upgrade x1 databases to Buster and Mariadb 10.4
Open, MediumPublic

Description

x1 should be an "easy" section to upgrade fully, as it doesn't replicate labs hosts, so we could upgrade up to the master.

  • db1120 (eqiad master)
  • db1095 (removed) moved now to db1102 as buster (backup source)
  • db1103
  • db1137
  • db2096 (codfw master)
  • db2101 (backup source)
  • db2115
  • db2131
  • dbstore1005 (T254870 )

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Jun 9, 12:42 PM
Marostegui triaged this task as Medium priority.Tue, Jun 9, 12:43 PM
Marostegui moved this task from Triage to Next on the DBA board.
Marostegui added a subscriber: jcrespo.

@jcrespo would you be okay if I upgrade db1095 and db2101 to Buster? Those are backup sources.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2131.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006091353_marostegui_105111.log.

Completed auto-reimage of hosts:

['db2131.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2131.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006091421_marostegui_129669.log.

Completed auto-reimage of hosts:

['db2131.codfw.wmnet']

and were ALL successful.

Marostegui updated the task description. (Show Details)Tue, Jun 9, 2:49 PM

Change 604036 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Enable notification on db2131

https://gerrit.wikimedia.org/r/604036

Change 604036 merged by Marostegui:
[operations/puppet@production] mariadb: Enable notification on db2131

https://gerrit.wikimedia.org/r/604036

Marostegui updated the task description. (Show Details)Wed, Jun 10, 7:04 AM

@jcrespo would you be okay if I upgrade db1095 and db2101 to Buster? Those are backup sources.

If you upgrade them, snapshots may not work, as dbprov hosts are not upgraded and it is not advisable to prepare them with older xtrabackup versions.

They also host s2 and s3 (db1095).

Aside from the version mismatch, as long a they are prepared with the same or newer version, everything should work.

Let me know how you want to proceed with this.

They also host s2 and s3 (db1095).

I can migrate x1 to db1140, which is already on buster, solving the extra sections issue. But not sure how to go about dbprov hosts.

I don't really know how to proceed :-( I was looking for ideas, should maybe upgrade more hosts in s2 and s3 so it it "worth" upgrading db1095 and dbprov?

Marostegui updated the task description. (Show Details)Wed, Jun 10, 9:48 AM
Marostegui updated the task description. (Show Details)Thu, Jun 11, 8:00 AM

This is the plan after a conversation on IRC:

<jynus> 1) I upgrade db2101 (x1) to 10.4
<jynus> and send snapshots to backup2002
<jynus> 2) I move db1102 (s4, s5) to db1145 (stretch)
<jynus> (I will keep dumps on the same dbprov)
<jynus> 3) I put x1 on db1102 (buster)
<jynus> (I then can remove it from db1095)
<jynus> 4) I backup x1 from db1102 to backup1002

<jynus> on Q1
<jynus> we will get dbprov[12]003
<jynus> and will be on buster directly and return snapshots from buster to it
<jynus> plus we will get an extra backup source also on buster for extra flexibility
jcrespo claimed this task.Mon, Jun 22, 3:24 PM
jcrespo moved this task from Next to In progress on the DBA board.

Change 607228 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Reimage db2101 (x1 backup source) to buster

https://gerrit.wikimedia.org/r/607228

Change 607228 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Reimage db2101 (x1 backup source) to buster

https://gerrit.wikimedia.org/r/607228

Mentioned in SAL (#wikimedia-operations) [2020-06-23T09:46:39Z] <jynus> stopping and reimaging db2101 into buster T254871

Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts:

['db2101.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006230948_jynus_15257.log.

Completed auto-reimage of hosts:

['db2101.codfw.wmnet']

and were ALL successful.

Change 607264 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Reenable db2101 with snapshots to backup2002

https://gerrit.wikimedia.org/r/607264

jcrespo updated the task description. (Show Details)Tue, Jun 23, 10:56 AM

Change 607264 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Reenable db2101 with snapshots to backup2002

https://gerrit.wikimedia.org/r/607264

Change 607267 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Reimage and wipe db1145 into stretch

https://gerrit.wikimedia.org/r/607267

Change 607267 merged by Jcrespo:
[operations/puppet@production] install_server: Reimage and wipe db1145 into stretch

https://gerrit.wikimedia.org/r/607267

Change 607288 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Setup db1145 as a mariadb backup source for s4, s5

https://gerrit.wikimedia.org/r/607288

Change 607288 merged by Jcrespo:
[operations/puppet@production] mariadb: Setup db1145 as a mariadb backup source for s4, s5

https://gerrit.wikimedia.org/r/607288

jcrespo added a comment.EditedWed, Jun 24, 1:13 PM

See: T253217#6252401 After that, x1 will be removed from db1095 (which will continue with stretch).

Marostegui updated the task description. (Show Details)Wed, Jun 24, 1:19 PM

Change 607510 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move x1 backup source from db1095 to db1102

https://gerrit.wikimedia.org/r/607510

Change 607515 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Remove x1 from db1095 and enable db1102 notif.

https://gerrit.wikimedia.org/r/607515

Change 607510 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move x1 backup source from db1095 to db1102

https://gerrit.wikimedia.org/r/607510

Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts:

['db1102.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006260844_jynus_24187.log.

Completed auto-reimage of hosts:

['db1102.eqiad.wmnet']

and were ALL successful.

Change 607515 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Remove x1 from db1095 and enable db1102 notif.

https://gerrit.wikimedia.org/r/607515

jcrespo updated the task description. (Show Details)Fri, Jun 26, 10:11 AM
jcrespo reassigned this task from jcrespo to Marostegui.EditedFri, Jun 26, 10:11 AM

db1095 instance (stretch) has been backed up and moved to db1102 (buster). Backups are now done there and sent to backup1002.

Edit: s/not/now/

Thank you! I will go ahead and finish codfw master and then start looking at dates to failover x1 primary master.

Change 608304 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Reimage db2096 (codfw x1 master) to Buster

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608304

Change 608304 merged by Marostegui:
[operations/puppet@production] mariadb: Reimage db2096 (codfw x1 master) to Buster

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608304

Mentioned in SAL (#wikimedia-operations) [2020-06-29T12:20:39Z] <marostegui> Stop MySQL on db2096 (codfw x1 master) for reimage T254871

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2096.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006291223_marostegui_12612.log.

Completed auto-reimage of hosts:

['db2096.codfw.wmnet']

and were ALL successful.

Change 608327 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2096: Enable notifications

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608327

Change 608327 merged by Marostegui:
[operations/puppet@production] db2096: Enable notifications

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608327

Marostegui updated the task description. (Show Details)Mon, Jun 29, 2:10 PM

Pending: schedule x1 master switchover