⚓ T276448 Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC

	Subject	Repo	Branch	Lines +/-
	dbbackups: Update backup metadata host db1080->db1159	operations/puppet	production	+2 -2
	mariadb: Promote db1159 to m1 master	operations/puppet	production	+5 -6

Marostegui created this task.Mar 4 2021, 12:36 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 4 2021, 12:36 PM

Marostegui triaged this task as Medium priority.Mar 4 2021, 12:36 PM

Marostegui moved this task from Triage to Blocked on the DBA board.

Change 668449 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbbackups: Update backup metadata host db1080->db1159

https://gerrit.wikimedia.org/r/668449

gerritbot added a project: Patch-For-Review.Mar 4 2021, 2:30 PM

Marostegui updated the task description. (Show Details)Mar 10 2021, 8:24 AM

Marostegui updated the task description. (Show Details)Mar 10 2021, 8:31 AM

Marostegui updated the task description. (Show Details)Mar 10 2021, 8:46 AM

Marostegui moved this task from Blocked to Ready on the DBA board.Mar 16 2021, 6:51 AM

RhinosF1 subscribed.Mar 23 2021, 11:42 AM

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

Marostegui updated the task description. (Show Details)Apr 8 2021, 8:26 AM

In T276448#6982841, @Marostegui wrote:

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

Once this is confirmed I will ping the rest of service owners

Marostegui updated the task description. (Show Details)Apr 8 2021, 8:45 AM

Marostegui updated the task description. (Show Details)Apr 8 2021, 8:47 AM

In T276448#6982943, @Marostegui wrote:

In T276448#6982841, @Marostegui wrote:

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

What time is this happening?

On the last months, the latest the backups finish is 8 UTC (normally they finish by 5 UTC). As long as this happens after then, we will be ok.

In T276448#6993282, @jcrespo wrote:

In T276448#6982943, @Marostegui wrote:

In T276448#6982841, @Marostegui wrote:

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

What time is this happening?

On the last months, the latest the backups finish is 8 UTC (normally the finish by 5 UTC). As long as this happens after then, we will be ok.

I can adapt to whatever works best for the backups

As long as it is not too early in the morning, 14 will be ok. We may want to do it late in the morning so etherpad and other owners are around? So it should be ok as long we merge the patch I prepared after switchover.

What about 10UTC? Would that work for backups? I will ping other owners if this works for you

In T276448#6993299, @Marostegui wrote:

What about 10UTC? Would that work for backups? I will ping other owners if this works for you

Sure.

Thank you Jaime.

@akosiaris would be available tomorrow at around 10 AM UTC tomorrow 14th April in case we need to restart etherpad?
@jbond @MoritzMuehlenhoff ok to restart mysql from cas and pki point of view tomorrow 14th April?
@ayounsi ok to restart mysql from librenms point of view tomorrow 14th April?

Affirm.

In T276448#6993361, @Marostegui wrote:

Thank you Jaime.

@akosiaris would be available tomorrow at around 10 AM UTC tomorrow 14th April in case we need to restart etherpad?

Yes

In T276448#6993361, @Marostegui wrote:

@jbond @MoritzMuehlenhoff ok to restart mysql from cas and pki point of view tomorrow 14th April?

Sounds good

No problem for Cas and ok I

Thank you all!

@ayounsi XDDDDD

Marostegui renamed this task from Failover m1 master: db1080 -> db1159 to Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.Apr 13 2021, 9:01 AM

ayounsi unsubscribed.Apr 13 2021, 9:01 AM

Change 678801 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Promote db1159 to m1 master

https://gerrit.wikimedia.org/r/678801

Kormat updated the task description. (Show Details)Apr 13 2021, 9:28 AM

Marostegui updated the task description. (Show Details)Apr 13 2021, 9:30 AM

Marostegui moved this task from Ready to In progress on the DBA board.Apr 14 2021, 4:59 AM

Moving this to 10:30 AM UTC as there's a power maintenance scheduled in my building which is supposed to end at 10:00 AM UTC, but just in case...

As pre step, everything moved under the new host.

Marostegui updated the task description. (Show Details)Apr 14 2021, 9:45 AM

Marostegui updated the task description. (Show Details)Apr 14 2021, 10:04 AM

Change 678801 merged by Marostegui:

[operations/puppet@production] mariadb: Promote db1159 to m1 master

https://gerrit.wikimedia.org/r/678801

All the pre-steps are done

Marostegui updated the task description. (Show Details)Apr 14 2021, 10:09 AM

Mentioned in SAL (#wikimedia-operations) [2021-04-14T10:30:33Z] <marostegui> Failover m1 from db1080 to db1159 - T276448

Marostegui updated the task description. (Show Details)Apr 14 2021, 10:34 AM

Marostegui updated the task description. (Show Details)

Change 668449 merged by Jcrespo:

[operations/puppet@production] dbbackups: Update backup metadata host db1080->db1159

https://gerrit.wikimedia.org/r/668449

Everything looks good, we are running some final checks to ensure backup infra is working fine after the swap.
The RO time was around 10 seconds.

Marostegui mentioned this in T280121: decommission db1080.eqiad.mnet.Apr 14 2021, 10:50 AM

Marostegui added a subtask: T280121: decommission db1080.eqiad.mnet.

Marostegui updated the task description. (Show Details)Apr 14 2021, 11:04 AM

Maintenance_bot removed a project: Patch-For-Review.Apr 14 2021, 11:10 AM

Backup metadata looking good:

root@db1159.eqiad.wmnet[dbbackups]> select * FROM backups order by id desc limit 1\G
*************************** 1. row ***************************
        id: 11052
      name: snapshot.s7.2021-04-14--10-50-22
    status: ongoing
    source: db2100.codfw.wmnet:3317
      host: dbprov2002.codfw.wmnet
      type: snapshot
   section: s7
start_date: 2021-04-14 12:29:30
  end_date: NULL
total_size: 1151727406294
1 row in set (0.032 sec)

(end date is null because copy of files finished but it hasn't finished yet postprocessing -probably compressing- fully)

So everything looking great from my side!

Thanks!
Closing this!

Marostegui updated the task description. (Show Details)Apr 14 2021, 1:46 PM

• Cmjohnson closed subtask T280121: decommission db1080.eqiad.mnet as Resolved.May 11 2021, 2:06 PM

Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC
Closed, ResolvedPublic
Actions

Description

Details

Related Objects
Search...

Event Timeline

	F34387980: Captura de pantalla 2021-04-14 a las 11.45.10.png
	Apr 14 2021, 9:45 AM

Status	Subtype	Assigned	Task
			Unknown Object (Task)
Resolved		Marostegui	T258361 Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers)
Resolved		jcrespo	T278408 db2135 crashed
Resolved		Marostegui	T279281 Upgrade 10.4.13 hosts to a higher version
Resolved		Marostegui	T276448 Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC
Resolved	Request	• Cmjohnson	T280121 decommission db1080.eqiad.mnet

Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTCClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC
Closed, ResolvedPublic
Actions

Related Objects
Search...