[scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server
Closed, ResolvedPublic2 Story Points

Description

How are we going to handle sync of mediawiki-staging between tin and mira? Wouldn't we want any sort of git change on one to be reflected on the other?

Yes, we need /srv/mediawiki-staging to match between tin and mira at the end of each deploy. One way to fix this is in scap itself by adding a new phase to each sync which updates a warm spare deployment server in the opposite DC including full git object data. Today scap (and sync-*) prepares multiple servers to be rsync proxies for the MW hosts, but these copies are missing git data files that would be needed for any of the hosts to take over as the new staging server. If the cross DC master sync was made to be the first step in the sync process (sync masters, sync proxies, sync MWs) we could remove some cross DC communication as well but syncing the proxies with the master in their DC.

bd808 created this task.Jul 6 2015, 12:47 AM
bd808 added a project: Deployment-Systems.
bd808 added subscribers: bd808, Krenair.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 6 2015, 12:47 AM
mmodell triaged this task as "Normal" priority.Jul 6 2015, 6:30 PM
mmodell moved this task from To Triage to Backlog on the Deployment-Systems board.
mmodell added a subscriber: mmodell.

I can imagine a few scenarios where lack of locking/coordination among deployers could cause this to go horribly wrong.

For one thing, a sync from mira could clobber any in progress work that is being done on tin (and vise-versa)

We currently avoid conflicting deployment work by following a schedule and manually checking for other logged in users on tin, as well as coordinating via IRC, however, none of these methods are ideal and the checking for other logged in users bit is easily defeated when there are two deployment servers.

Shouldn't we create some sort of flag / mutex that must be obtained on both servers before beginning deployment work, and is then released at the end of a sync?

bd808 added a comment.Jul 8 2015, 4:49 PM

Shouldn't we create some sort of flag / mutex that must be obtained on both servers before beginning deployment work, and is then released at the end of a sync?

I've been thinking about this a bit and tend to agree. With the introduction of etcd for other cluster state management (eg varnish and pybal) we might be able to create a reliable cross-master mutex. I'm not sure that it is a critical blocker however. One of the hard bits with our current setup is that even on a single server we don't have anything to stop multiple people from messing about in /srv/mediawiki-staging/ at the same time. As you pointed out we try to solve that with social contracts today and I think that at least for the initial implementation these same conventions will work for adding mira. I think we should look at mira more as a warm spare than anything else. We might choose to exercise it from time to time to make sure everything is still working but I wouldn't expect people to randomly deploy from either server.

Yes, I agree that it's not a blocker but just something we should consider. Even a mutex with some manual process behind it would be better than what we have now.

Change 224313 had a related patch set uploaded (by BryanDavis):
[WIP] Sync /srv/mediawiki-staging to co-masters

https://gerrit.wikimedia.org/r/224313

bd808 claimed this task.Jul 12 2015, 8:34 PM
bd808 removed bd808 as the assignee of this task.Oct 7 2015, 10:30 PM
mmodell set Security to None.Oct 12 2015, 4:13 PM
mmodell edited a custom field.

Change 224829 had a related patch set uploaded (by Chad):
scap: Add co-master configuration

https://gerrit.wikimedia.org/r/224829

Change 224313 merged by jenkins-bot:
Sync /srv/mediawiki-staging to co-masters

https://gerrit.wikimedia.org/r/224313

Change 247965 had a related patch set uploaded (by BryanDavis):
Provide scap control server FQDN to proxy sync commands

https://gerrit.wikimedia.org/r/247965

Change 247965 merged by jenkins-bot:
Provide scap control server FQDN to proxy sync commands

https://gerrit.wikimedia.org/r/247965

Change 249684 had a related patch set uploaded (by BryanDavis):
Make mediawiki-config clone be owned by mwdeploy

https://gerrit.wikimedia.org/r/249684

Change 249684 merged by Muehlenhoff:
Make mediawiki-config clone be owned by mwdeploy

https://gerrit.wikimedia.org/r/249684

Change 224829 merged by Alexandros Kosiaris:
scap: Add co-master configuration

https://gerrit.wikimedia.org/r/224829

demon closed this task as "Resolved".Nov 5 2015, 8:17 PM
demon claimed this task.
demon moved this task from Services MVP to Done on the Scap board.
bd808 added a comment.Nov 5 2015, 8:59 PM

Thanks for carrying this across the finish line @demon. :)

mmodell reopened this task as "Open".Nov 16 2015, 7:16 PM

This is still not done.

There is a patch against operations/puppet
and a related differential revision for scap: D48

demon moved this task from Done to Services MVP on the Scap board.Nov 24 2015, 6:06 PM

this should be finally resolved.

Dzahn added a subscriber: Dzahn.Nov 25 2015, 7:08 PM

yes:) let me know if we can close, then we also close T95436 above that :)

demon closed this task as "Resolved".Dec 11 2015, 7:27 PM

I think we're done here folks.