Page MenuHomePhabricator

Remove need to manually switch swiftrepl timer after datacenter switchover
Closed, ResolvedPublic

Description

As explained at https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Media_storage/Swift after a switchover you need to manually adjust hiera so the swiftrepl job only runs in the active datacenter.

We can just have the script itself check what the active datacenter is, and only run if it's in that datacenter.

11:38:14 <legoktm> the other thing I wanted to figure out was how to automate https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Media_storage/Swift
11:40:08 <godog> legoktm: yeah I think if we can teach puppet (or maybe puppet knows already?) which mw site is primary then it should be straightforward
11:41:05 <godog> or sth along those lines, whether it is puppet that flip things or swiftrepl itself that knows how to DTRT
11:41:15 <godog> you get the idea
11:41:50 <legoktm> there's a WMFMasterDatacenter etcd setting, I'm not sure puppet can read it, but we could just have a wrapper around swiftrepl that checks it before running
11:42:34 <godog> yeah the wrapper SGTM
...
11:46:46 <rzl> (drive-by) fwiw we do have mediawiki::state('primary_dc') in puppet but it's not great to use, the wrapper sounds better to me too

Event Timeline

Change 701052 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] swift: Only run swiftrepl-mw in the active datacenter

https://gerrit.wikimedia.org/r/701052

Change 701052 merged by Legoktm:

[operations/puppet@production] swift: Only run swiftrepl-mw in the active datacenter

https://gerrit.wikimedia.org/r/701052

On ms-fe2005.codfw.wmnet the swiftrepl-mw.service unit was masked, so I unmasked it: legoktm@ms-fe2005:~$ sudo systemctl unmask swiftrepl-mw.service - ping T285425: Puppet does not undo manual "systemctl mask $unit".

I then manually started the service and the output looked like:

Jun 25 18:10:04 ms-fe2005 systemd[1]: Started Ensure mediawiki containers are synchronized across sites.
Jun 25 18:10:05 ms-fe2005 swiftrepl-mw[26550]: Skipping execution, not the primary datacenter!

I'll update the wiki page with a note that the swift section is automated now and remove it after this successfully auto-switches post-switchover.