Page MenuHomePhabricator

MariaDB Pre-DC switchover tasks
Closed, ResolvedPublic

Description

To be done one week before the switchover:

  • Create calendar events: Stop DB maintenance (6 days before the switch), enable circular replication (3 days before the switch), disable circular replication (1 day after the switch), resume DB maintenance (after the disabling circular replication) done by @Marostegui
  • Ensure the datapoints on https://zarcillo.wikimedia.org/ui/weights are fresh
  • compare weights in s* and x3 (they should all be 300 and one replica per section in each DC should be 100)
  • compare weights in es* (they should all be 100 on replicas and 0 on the master for writable sections and 100 for all hosts in RO sections)
  • compare weights x1 (they should be all 300 and one 100 in each DC)
  • check all pc* and ms* sections are pooled
  • ensure that 1 master 1 candidate master per section declared and usual topology constraints are OK.
  • monitoring notifications enabled on all relevant hosts
  • clean orchestrator output
  • Write a dns patch for the new masters and compare DNS, puppet, dbctl and real replica topology (e.g. orchestrator/dbtree) agrees using https://switchmaster.toolforge.org/dc-switch
  • [skipped] Run compare table among eqiad and codfw masters to ensure they all have the same data (see missing checks at T375507)
  • Run diff between live configuration among eqiad and codfw masters (binlog retention, gtid enabled/disabled, server version, etc.)

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add simple variable comparator toolrepos/sre/wmfmariadbpy!22fcerattoT416705main
Customize query in GitLab

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui updated the task description. (Show Details)
Marostegui moved this task from Triage to Blocked on the DBA board.
Marostegui edited subscribers, added: FCeratto-WMF; removed: Blake, MLechvien-WMF, Clement_Goubert and 3 others.
Marostegui added a subscriber: Blake.

Mentioned in SAL (#wikimedia-operations) [2026-03-03T16:12:31Z] <fceratto@cumin1003> dbctl commit (dc=all): 'Setting db1188 weight to 300 T416705', diff saved to https://phabricator.wikimedia.org/P89723 and previous config saved to /var/cache/conftool/dbconfig/20260303-161230-fceratto.json

Mentioned in SAL (#wikimedia-operations) [2026-03-03T16:13:24Z] <fceratto@cumin1003> dbctl commit (dc=all): 'Setting db1169 weight to 300 T416705', diff saved to https://phabricator.wikimedia.org/P89724 and previous config saved to /var/cache/conftool/dbconfig/20260303-161323-fceratto.json

Mentioned in SAL (#wikimedia-operations) [2026-03-03T16:18:47Z] <fceratto@cumin1003> dbctl commit (dc=all): 'Setting db1188 weight to 100 T416705', diff saved to https://phabricator.wikimedia.org/P89726 and previous config saved to /var/cache/conftool/dbconfig/20260303-161846-fceratto.json

Mentioned in SAL (#wikimedia-operations) [2026-03-03T16:28:37Z] <fceratto@cumin1003> dbctl commit (dc=all): 'Setting x1 codfw weights to 300 T416705', diff saved to https://phabricator.wikimedia.org/P89728 and previous config saved to /var/cache/conftool/dbconfig/20260303-162836-fceratto.json

FCeratto-WMF changed the task status from Open to In Progress.Mar 19 2026, 9:48 AM
FCeratto-WMF moved this task from Blocked to In progress on the DBA board.

Change #1255655 had a related patch set uploaded (by Federico Ceratto; author: Federico Ceratto):

[operations/dns@master] wmnet: update CNAME records for DB masters to eqiad

https://gerrit.wikimedia.org/r/1255655

Change #1255669 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/dns@master] wmnet: update CNAME records for DB masters for dc switchover

https://gerrit.wikimedia.org/r/1255669

Telling what I've told Federico. Since we have been active/active for many years. Any major data inconsistency would have been surfaced to the users and then to us. So I don't think it's worth doing data comparison check before dc switchovers. We should try to build a system to do it anyway and all year round, more structured and more automatically but I don't think it's high priority.

FCeratto-WMF updated the task description. (Show Details)

Change #1255669 merged by Blake:

[operations/dns@master] wmnet: update CNAME records for DB masters for dc switchover

https://gerrit.wikimedia.org/r/1255669

Change #1255655 abandoned by Federico Ceratto:

[operations/dns@master] wmnet: update CNAME records for DB masters to eqiad

Reason:

Superseded by another CR

https://gerrit.wikimedia.org/r/1255655