⚓ T299624 Switchover m1 master (db1159 -> db1128)

Subject	Repo	Branch	Lines +/-
db1159: Disable notifications	operations/puppet	production	+1 -0
dbbackups: Manually switchover primary stats db db1159 -> db1128	operations/puppet	production	+2 -2
mariadb: Promote db1128 to m1 master	operations/puppet	production	+7 -7
db1128: Enable notifications	operations/puppet	production	+0 -1

Status	Assigned	Task
Open	None	T291916 Tracking task for Bullseye migrations in production
Resolved	Marostegui	T298585 Upgrade WMF database-and-backup-related hosts to bullseye
Resolved	Marostegui	T299344 Upgrade m1 to Bullseye
Resolved	Marostegui	T299624 Switchover m1 master (db1159 -> db1128)

Marostegui created this task.Jan 20 2022, 7:55 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 20 2022, 7:55 AM

Stalling this until db1128 is installed and populated with data.
Once this is done, I will add service owners to the task so we can arrange a date for everyone.

Marostegui triaged this task as Medium priority.Jan 20 2022, 7:56 AM

Marostegui updated the task description. (Show Details)Jan 20 2022, 9:41 AM

@jcrespo @MoritzMuehlenhoff @jbond @akosiaris @ayounsi I would like to do this master switchover on Thursday 27th at 10AM UTC. I expect just a few seconds of read-only time.
Would this date work for you all?

In T299624#7639322, @Marostegui wrote:

@jcrespo @MoritzMuehlenhoff @jbond @akosiaris @ayounsi I would like to do this master switchover on Thursday 27th at 10AM UTC. I expect just a few seconds of read-only time.
Would this date work for you all?

Ack, sounds good!

+1

+1 as owner of database dbbackups and bacula9 and not sure if something else there.

With the backup hat, I believe I already have an answer, but just double confirming that the end state will be the same secondary database (no need to change anything regarding backup sources, right?).

Correct Jaime, nothing will change on the backups front!

As everyone can, I will do this on Thursday 27th at 10AM UTC

Marostegui changed the task status from Stalled to Open.Jan 21 2022, 11:28 AM

Marostegui updated the task description. (Show Details)

Change 755960 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Manually switchover primary stats db db1159 -> db1128

https://gerrit.wikimedia.org/r/755960

gerritbot added a project: Patch-For-Review.Jan 21 2022, 11:39 AM

^@Marostegui I just remembered that dbbackups point to db1159, and not the proxy, due to the current TLS certificate limitation, and the worry about sensitive data being accessed cross-datacenter. It will have to be deployed after switchover. I can do it but involving you in case I don't happen to be around.

This is an exception we discussed in the past, and that in the future should be solved with a different TLS certificate workflow (there is nothing other than that preventing the usage of the proxy).

No problem, I can take care of that, I will add it to the list of steps

Marostegui updated the task description. (Show Details)Jan 21 2022, 11:49 AM

Marostegui updated the task description. (Show Details)

Marostegui updated the task description. (Show Details)Jan 21 2022, 11:53 AM

Marostegui mentioned this in T299344: Upgrade m1 to Bullseye.

Marostegui moved this task from Blocked to In progress on the DBA board.Jan 24 2022, 7:53 AM

Change 757388 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1128: Enable notifications

https://gerrit.wikimedia.org/r/757388

Change 757388 merged by Marostegui:

[operations/puppet@production] db1128: Enable notifications

https://gerrit.wikimedia.org/r/757388

Change 757389 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Promote db1128 to m1 master

https://gerrit.wikimedia.org/r/757389

Marostegui updated the task description. (Show Details)Jan 26 2022, 9:17 AM

Mentioned in SAL (#wikimedia-operations) [2022-01-27T09:23:38Z] <root@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2132].codfw.wmnet,db[1117,1128,1159].eqiad.wmnet with reason: Primary switchover m1 T299624

Mentioned in SAL (#wikimedia-operations) [2022-01-27T09:23:43Z] <root@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2132].codfw.wmnet,db[1117,1128,1159].eqiad.wmnet with reason: Primary switchover m1 T299624

Marostegui updated the task description. (Show Details)Jan 27 2022, 9:24 AM

Marostegui updated the task description. (Show Details)Jan 27 2022, 9:27 AM

Marostegui updated the task description. (Show Details)

Change 757389 merged by Marostegui:

[operations/puppet@production] mariadb: Promote db1128 to m1 master

https://gerrit.wikimedia.org/r/757389

Marostegui updated the task description. (Show Details)Jan 27 2022, 9:35 AM

Marostegui updated the task description. (Show Details)