Page MenuHomePhabricator

Restart m5 master (db1128)
Closed, ResolvedPublic

Description

In order to enable report_host flag (T266483) on m5 master (db1128), we need to restart its MySQL.
This host currently has the following active databases:

labsdbaccounts
labswiki
striker
test_labsdbaccounts
testreduce
testreduce_vd

When: Thursday 28th January at 09:00AM UTC

Impact: The above databases (including wikitech) will be unavailable for a few minutes (no reads/no writes)
This is just a daemon restart, so it shouldn't take too long, maybe a couple of minutes.

@Andrew @aborrero @Bstorm @ssastry would you be ok with this date?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Marostegui triaged this task as Medium priority.Jan 19 2021, 3:37 PM
Marostegui moved this task from Triage to Ready on the DBA board.
Marostegui updated the task description. (Show Details)

I think this works for us, thanks for the heads up.

Do you think the downtime should be announced to stakeholders? Wikitech being down seems like something some folks may want to know in advance.

I think this works for us, thanks for the heads up.

Do you think the downtime should be announced to stakeholders? Wikitech being down seems like something some folks may want to know in advance.

Ideally sending an email to wikitech-l + User-notice tag should be enough, but I wanted to wait for the date/time confirmation, let's see if this time also works for @ssastry

Works for me. Tangentially, we are currently in process of possibly stopping all use of testreduce database for our tests and it is possible we might get it done by then as well.

Works for me. Tangentially, we are currently in process of possibly stopping all use of testreduce database for our tests.

Works for me. Tangentially, we are currently in process of possibly stopping all use of testreduce database for our tests and it is possible we might get it done by then as well.

Oh sweet - in that case, please create a ticket so we can drop those databases :)

One small detail- I am unsure if labswiki use the proxy / has its failover service configured, due to it being handled by mediawiki, so for wikitech it may be a hard down (not only a read-only status). Please Marostegui confirm (I may have outdated info).

Correction, you already mentioned that on the body, ignore.

@jcrespo there is no active proxy for m5 - as I stated on the task there will be no reads and no writes.

Tech News might normally be a bit overkill for Wikitech being down for a couple of minutes (I'd recommend wikitech-l and a few short posts to a couple of technical wiki pages, in that case), but since there's already an item about Commons, we can just merge the two into one item and have this one tag along. It'll go out on Monday 25 January.

@Marostegui that timing sounds fine to me, especially if someone other than me (@Johan?) announces the downtime in advance.

@Marostegui that timing sounds fine to me, especially if someone other than me (@Johan?) announces the downtime in advance.

I sent the email to wikitech-l and ops@ yesterday.
Regarding the banner, I will leave that up to @Johan to decide.

Thanks

To be very clear we're talking about the same thing, by "Wikitech" we're just referring to wikitech.wikimedia.org/wiki/ here and nothing else right?

I think the mailing lists and Tech News can be considered having announced it in advance. Were it a wiki seeing a lot of traffic, or we were expecting more than a couple of minutes of downtime, that might change things, but I think this should be enough for now.

I think the mailing lists and Tech News can be considered having announced it in advance. Were it a wiki seeing a lot of traffic, or we were expecting more than a couple of minutes of downtime, that might change things, but I think this should be enough for now.

agreed!

Added to the deployments calendar

Procedure:

Pre restart

  • Silence m5 hosts
  • buffer pool dump + disablement in advance to make the restart faster

Restart

  • !log m5 master restart, wikitech will be unavailable - T272388
  • set db1128 in RO
  • db1128: restart mysql
  • verify report_host is enabled
  • verify read_only is OFF
  • Once mysql is back: reload haproxy on dbproxy1021 (non active) and dbproxy1017 (non active)
  • check everything is ok
  • close this task

This was done.
Downtime start: 09:00:23
Downtime stop: 09:00:51
Total: 28 seconds

+--------------------+
| @@report_host      |
+--------------------+
| db1128.eqiad.wmnet |
+--------------------+
1 row in set (0.001 sec)