Page MenuHomePhabricator

Restart m2 database master (db1107)
Closed, ResolvedPublic

Description

In order to be able to enable report_host variable on db1107 we'd need to issue a mariadb daemon restart for db1107.
We expect this to take around 1 minute.

Affected databases:

[x] debmonitor
[x] mwaddlink
[x] otrs
[x] recommendationapi
[x] sockpuppet
[x] xhgui

When: Wednesday 3th Feb at 09:00AM UTC
Impact: All these databases will be on read-only for around 1 minute.

Event Timeline

Marostegui moved this task from Triage to Ready on the DBA board.
Marostegui added a project: Research.
Marostegui added subscribers: leila, Krenair.

@akosiaris @hnowlan @MoritzMuehlenhoff @kostajh Performance-Team Research I would like to propose Wednesday 5th Feb at 09:00AM UTC to restart the above host.
Would that be ok?

No objections from us on Sockpuppet.

@akosiaris @hnowlan @MoritzMuehlenhoff @kostajh Performance-Team Research I would like to propose Wednesday 5th Feb at 09:00AM UTC to restart the above host.
Would that be ok?

sounds good, thanks!

Fine by me, I 'll keep an eye on OTRS and recommendation api.

@akosiaris @hnowlan @MoritzMuehlenhoff @kostajh Performance-Team Research I would like to propose Wednesday 5th Feb at 09:00AM UTC to restart the above host.
Would that be ok?

Sounds good, I'll be around.

Thank you guys for the fast responses!
Going to schedule it for Wednesday 3th Feb at 09:00AM UTC then - @dpifke if this doesn't work for xhgui let me know!

Marostegui updated the task description. (Show Details)

I checked the 2020 calendar (I wonder why.....), Wednesday is 3rd of Feb, not 5th :)

Maintenance window booked on the deployment calendar

Procedure:

Pre restart

  • Silence m2 hosts
  • buffer pool dump + disablement in advance to make the restart faster

Restart

  • !log m2 master restart - T272964
  • db1107: restart mysql
  • verify report_host is enabled
  • verify read_only is OFF
  • Once mysql is back: reload haproxy on dbproxy1013 (active) and dbproxy1015 (passive)
  • check everything is ok
  • close this task

This has been delayed a little bit as there's an unrelated incident going on.

This was done.
RO start: 09:58:05
RO stop: 09:58:48

All the services recovered without human intervention.