Page MenuHomePhabricator

Upgrade and restart m1 master (db1135)
Closed, ResolvedPublic

Description

In order to continue with T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes we'd need to restart MySQL on m1 host (db1135).
As we have to restart MySQL, we will also upgrade it to the newer version.

These are the databases m1 currently store:

bacula
bacula9
etherpadlite
librenms
racktables
rddmarc
rt

The restart should only take a few seconds, but during that time, those databases won't be available.
Last time we operated on m1 was to promote a new host to master (T231403) , and we had to kill some connections on several services, but that should not be required as the server isn't changing this time, just getting restarted.

@Trizek-WMF I have CC'ed you here as last time you posted a message on technews for etherpad (T231403#5464235) - let's wait for a concrete day/time.

I think what needs more coordination is the backups part, based on the backups job schedule, @akosiaris @jcrespo any preference on when this work can be done, or rather, which days should we avoid? :-)

@ayounsi what about from your side? Any day we should avoid?

Thanks!

Event Timeline

Marostegui triaged this task as Medium priority.Feb 4 2020, 2:23 PM
Marostegui moved this task from Triage to Pending comment on the DBA board.

Anytime works for LibreNMS.

Thank you! <3

@Trizek-WMF I have CC'ed you here as last time you posted a message on technews for etherpad (T231403#5464235) - let's wait for a concrete day/time.

You just need to:

  1. comment with a date and a short explanation
  2. tag this task with User-notice

The Tech News team with then add this item to Tech News.

You can also directly edit the future Tech News issue, again with the date, one short sentence and the link to this task. The deadline is on the Thursday, on the week before the change happens.

For reference, the similar maintenance performed at T244209 resulted in 74 seconds of downtime.

Sorry, I thought I had answered, but I apparently I did not hit submit.

Any time during the UTC day, outside of the first 1 week of a month is ok for bacula. Preferably, not on a wednesday. It can be done outside of that, but that is the preferred time.

Let's aim for Thursday 20th at 09:00AM UTC?

@jcrespo @akosiaris any tentative date?

Anytime is good for etherpad!

Let's aim for Thursday 20th at 09:00AM UTC?

Cool to me, send some invites this way! :-D

Let's aim for Thursday 20th at 09:00AM UTC?

Cool to me, send some invites this way! :-D

Will do - will also notify on wikitech-l
@Trizek-WMF so we are going to do this maintenance Thursday 20th at 09:00 AM UTC, can you post it on Technews?
We believe etherpad might be unavailable for around 1 minute.

@Trizek-WMF so we are going to do this maintenance Thursday 20th at 09:00 AM UTC, can you post it on Technews?

I'm just coming back today from a few days off.

Your maintenance operation hasn't been published on Tech News since you haven't followed my guidance. The deadline was on last Thursday. I'm sorry.

I hope that the email to wikitext-l will be enough.

@Trizek-WMF so we are going to do this maintenance Thursday 20th at 09:00 AM UTC, can you post it on Technews?

I'm just coming back today from a few days off.

since you haven't followed my guidance.

Sorry about that - I'm running 100 mph and having lots of things to do at the same time.

We all run at high speeds. Don't worry though, I don't think that missing Tech News will be a blocker. :)

Window reserved on the deployment's page

Mentioned in SAL (#wikimedia-operations) [2020-02-20T08:35:57Z] <marostegui> Upgrade mysql on db1135 without restart T244238

Mentioned in SAL (#wikimedia-operations) [2020-02-20T08:40:25Z] <jynus> disable puppet and stop bacula service T244238

Mentioned in SAL (#wikimedia-operations) [2020-02-20T09:00:27Z] <marostegui> Restart m1 database master db1135 (etherpad will not be available for around 1 minute) - T244238

Mentioned in SAL (#wikimedia-operations) [2020-02-20T09:02:31Z] <akosiaris> restart etherpad-lite on etherpad1002 T244238

This was done successfully.
Downtime was from 09:00:28 to 09:01:14

Closing this, thanks @jcrespo and @akosiaris for being around to take care of the related services that live in m1.