Page MenuHomePhabricator

Upgrade and restart m2 primary database master (db1132)
Closed, ResolvedPublic

Description

In order to continue with T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes we'd need to restart MySQL on m2 host (db1132).
As we have to restart MySQL, we will also upgrade it to the newer version.

These are the used databases m2 currently store:

debmonitor
otrs
recommendationapi
reviewdb (gerrit)

The restart should only take a few seconds, but during that time, those databases won't be available for writes. Reads should remain not affected as the proxy will failover to the standby.
The last restarts we have done have taken between 30 and 60 seconds to be completed, so we expect something similar with this host.

Research can you let us know when can we schedule this restart and if you need to do something from your side?
@Krenair could you help us notifying on the OTRS side once we've picked a date/time?
I plan to email also wikitech-l

When: Tuesday 19th March 09:00 AM UTC

Thanks!

Event Timeline

Restricted Application added a subscriber: Scoopfinder. · View Herald TranscriptFeb 25 2020, 12:45 PM
Marostegui triaged this task as Medium priority.Feb 25 2020, 12:45 PM
Marostegui moved this task from Triage to Next on the DBA board.

@leila would you be able to discuss this ticket with your team to try to find some suitable dates?.
If your service is resilient to a few seconds of downtime and there's nothing from your side to do, just let me know and I will pick the dates myself.

Thank you!

As far as OTRS goes I can be around and help with restarts/verifying behavior and all that jazz. Pick dates that suit you and lemme know.

As far as OTRS goes I can be around and help with restarts/verifying behavior and all that jazz. Pick dates that suit you and lemme know.

Thanks - will pick some dates as soon as Research let us know if they need to be around or restart/check something. Thanks!

leila added a subscriber: bmansurov.Mar 9 2020, 5:31 PM

@Marostegui please go ahead. We can handle a few sec potential down for recommendationapi. (@bmansurov FYI)

Thanks @leila!
@akosiaris does Tuesday 17th at 09:00 AM UTC work?

Thanks @leila!
@akosiaris does Tuesday 17th at 09:00 AM UTC work?

Fine by me.

Excellent - going to send calendar invite and block that time on the deployment page.

Marostegui updated the task description. (Show Details)Mar 10 2020, 7:36 AM
Marostegui moved this task from Next to In progress on the DBA board.Mar 10 2020, 8:50 AM

In the end, this will happen Thursday 19th March 09:00 AM UTC

Marostegui updated the task description. (Show Details)Mar 12 2020, 2:03 PM
Marostegui updated the task description. (Show Details)Mar 16 2020, 7:01 AM

m2 eqiad proxies that will require reload:
dbproxy1015: active
dbproxy1013: passive

m2 codfw proxy requires no action.

Hosts to downtime:
db2133
db2078
db1132
db1117

Mentioned in SAL (#wikimedia-operations) [2020-03-19T06:33:09Z] <marostegui> Upgrade db1132 without restarting T246098

Mentioned in SAL (#wikimedia-operations) [2020-03-19T09:00:26Z] <marostegui> Restart m2 primary database master - T246098

Mentioned in SAL (#wikimedia-operations) [2020-03-19T09:01:36Z] <akosiaris> restart recommendation-api on scb T246098

Mentioned in SAL (#wikimedia-operations) [2020-03-19T09:02:13Z] <akosiaris> restart otrs-daemon, apache on mendelevium T246098

Mentioned in SAL (#wikimedia-operations) [2020-03-19T09:03:00Z] <akosiaris> restart gerrit on gerrit1001 T246098

Mentioned in SAL (#wikimedia-operations) [2020-03-19T09:26:08Z] <marostegui> m2 maintenance window done T246098

Marostegui closed this task as Resolved.Mar 19 2020, 9:26 AM
Marostegui added subscribers: Volans, jcrespo.

This was done.
MySQL downtime was 60 seconds:

Starts: 9:00:29
End: 9:01:29

Thanks so much everyone who was around to support this maintenance! @akosiaris @Volans @jcrespo!

I have updated the documentation with the procedure/people to contact for any m2 db maintenance: https://wikitech.wikimedia.org/w/index.php?title=MariaDB%2Fmisc&type=revision&diff=1860614&oldid=1853986