Page MenuHomePhabricator

Restart x1 database master (db1103)
Closed, ResolvedPublic

Description

Due to T279281: Upgrade 10.4.13 hosts to a higher version we need to restart x1 primary database master.
x1 currently has:

  • Databases: flowdb, cognate_wiktionary and wikishared database,
  • Tables on the x1 wikis are:
+-------------------+
| Tables_in_enwiki  |
+-------------------+
| aft_feedback      |
| echo_email_batch  |
| echo_event        |
| echo_notification |
| echo_target_page  |
+-------------------+
bounce_records
cx_corpora
cx_lists
cx_suggestions
cx_translations
cx_translators
echo_unread_wikis
reading_list
reading_list_entry
reading_list_project
urlshortcodes
wikimedia_editor_tasks_counts
wikimedia_editor_tasks_edit_streak
wikimedia_editor_tasks_keys
wikimedia_editor_tasks_targets_passed

The last time we did an x1 maintenance ( T273758 T226358 and T250701) we used a bunch of tags, so I am tagging the same and subscribing the same people.

Impact: x1 will be on read-only for around 1 minute. Writes will not go through but reads will remain unaffected
When: Wednesday 5th May at 06:00 AM UTC

Event Timeline

Marostegui moved this task from Triage to Ready on the DBA board.
Tgr subscribed.

GrowthExperiments also uses x1 (not for enwiki but for a number of others). As far as I'm aware it won't be nontrivially impacted by a short readonly period.

Making sure I read the task correctly:

Only English Wikipedia and Growth features (trivially) will be affected.
Echo, CX and a few other things will not allow writes.

Does this mean that during this minute, edits will fail to cause notifications at English Wikipedia? That is, editors will never be notified for the things that happened during this restart?

@Johan all wikipedias will be affected, as x1 holds echo for all the projects.

@Marostegui Thanks! And notifications for this minute will be lost, not just delayed?

I am not fully sure about that, as I don't know whether they get re-tried or not.

IIRC, notifications work on a one-time attempt. I'll ask around.

This read-only is a big one, affecting multiple wikis and cross-wiki services. We should treat is as such. @Marostegui, do you plan to have a task to inform communities about it?

From what I can see on the last time we did this, it was just announce on tech news: https://phabricator.wikimedia.org/T273758#6832250

@Trizek-WMF if you want me to create a task for that I can do that too, whatever works best for you.

Last time, we setup a banner. :)

It is not a big deal to create this banner. I prefer to have our communities being informed, to prevent messages from unhappy people who haven't been warned about the read-only.

Mentioned in SAL (#wikimedia-operations) [2021-05-04T11:56:35Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-05-04T11:58:42Z] <marostegui> Upgrade mysql and kernel on db1120 T281212

Mentioned in SAL (#wikimedia-operations) [2021-05-04T12:49:38Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-05-04T12:50:19Z] <marostegui> Upgrade mysql and kernel on db1137 T281212

All slaves have been upgraded to 10.4.18, so the master is ready for the operation tomorrow.

All hosts silenced.
Master binary's upgraded, waiting now to perform the restart at 06:00 AM UTC

Mentioned in SAL (#wikimedia-operations) [2021-05-05T06:00:03Z] <marostegui> Restart mysqld on x1 database primary master (db1103) T281212

This was done.
RO starts: 06:00:15
RO stops: 06:00:46

Total RO time: 31 seconds