Page MenuHomePhabricator

Reboot pc1012
Closed, ResolvedPublic

Description

Steps:

  • Stop replication on pc1014.
  • Merge CR to change pc2 primary to be pc1014: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/786955
  • Deploy: scap sync-file wmf-config/ProductionServices.php "Set pc1014 as pc2 primary T306983"
  • Downtime pc2: sudo cookbook sre.hosts.downtime --hours 1 -r "Rebooting pc1012 T306983" 'A:db-section-pc2'
  • Reboot pc1012: SKIP_DBCTL=1 SKIP_START_SLAVE=1 ~kormat/bin/reboot-host T303174 pc1012.eqiad.wmnet
  • Revert previous CR
  • Deploy: scap sync-file wmf-config/ProductionServices.php "Set pc1012 as pc2 primary T306983"
  • Start replication on pc1014.

Note that pc2012 will stay trying to replicate from pc1012 during this time.

Related Objects

StatusSubtypeAssignedTask
ResolvedKormat

Event Timeline

Change 786955 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/mediawiki-config@master] ProductionServices: Promote pc1014 to primary of pc2.

https://gerrit.wikimedia.org/r/786955

I would stop pc1014 to replicate from pc1012 before promoting it - I am not sure how MW copes with pcX masters and replication threads in terms of errors or expectations.
The rest looks good.

Once you've moved pc1014 somewhere else, please remember to configure GTID for it (right now it is not using GTID)

Change 786955 merged by jenkins-bot:

[operations/mediawiki-config@master] ProductionServices: Promote pc1014 to primary of pc2.

https://gerrit.wikimedia.org/r/786955

Mentioned in SAL (#wikimedia-operations) [2022-04-28T10:45:51Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on pc2012.codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Rebooting pc1012 T306983

Mentioned in SAL (#wikimedia-operations) [2022-04-28T10:45:56Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Rebooting pc1012 T306983

Mentioned in SAL (#wikimedia-operations) [2022-04-28T10:46:08Z] <kormat@deploy1002> Synchronized wmf-config/ProductionServices.php: Set pc1014 as pc2 primary T306983 (duration: 00m 52s)

Kormat triaged this task as Medium priority.Apr 28 2022, 10:46 AM
Kormat updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2022-04-28T10:55:14Z] <kormat@deploy1002> Synchronized wmf-config/ProductionServices.php: Set pc1012 as pc2 primary T306983 (duration: 00m 57s)