Tracking task to capture discussion for defining transport mechanism for Future Deployment Tool™ as discussed during the deployment cabal meeting
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | dduvall | T101023 EPIC: Future Deployment Tooling | |||
Resolved | • Pchelolo | T102667 Create or improve the RESTBase deploy method | |||
Resolved | • mmodell | T102687 Get ops feedback regarding the use of SSH for deployment system control channel. | |||
Resolved | • mobrovac | T103344 RESTBase deployment process |
Event Timeline
fwiw: @bd808 thinks we can wrangle the MediaWiki deployments to use git rather than rsync
This would involve essentially creating a new repo to represent mediawiki-staging, and after all the prep work is done for a deployment we would just force-add everything that changed in /srv/mediawiki-staging and committing to this staging repo. So the staging repo will have a single commit per deployment with all of changes since the previous deployment
The downside of doing git transport is that git is still diff-based and not really great a moving binary blobs around so it won't be of much help for things like the l10n cdb files or HHVM .hhbc caches that we will need to enable RepoAuthoratative. It should however take a lot of IO load off of the fanout servers by removing stat calls needed to guess if a file has changed for rsync.
Notes from Meeting 6/8/15
- conversation between Mukunda, Bryan: possibility of using git in scap deploy by buildling local repo from core + extensions
- Middle-step git repository for mediawiki
- binaries/blobs?
- might be possible with git-annex, better than git fat
- git-torrent? might be worth further research https://github.com/cjb/GitTorrent
- how do we handle submodules?
- building local repo for deployment might ... ? (sorry, missed it)
- Current services uses git via apache
- Sometimes subjectively slow, may be room for improvement
- Fan out
- Preseeding, proxies needed -- could this be a post-merge jenkins job?
- Deploy master node
- Currently always tin
- Could start deploy from other proxies (pre-seed target)
- Agnosticly built in terms of proxies vs pre-seed target
- Might be good time to experiment with git based transport
- Determine whether it's going to be over ssh/https/torrent/etc.
- Run tests outside of prod to determine if feasible for MW
- nginx/varnish, something, in each datacenter for fanout proxy
- less clobbering
- New work_dir after fetch with filecheckout
- Moving git blobs solutions
- Git large file store https://git-lfs.github.com/
* Pros:
Scalable: github uses it
** Supported: seems to be developed and supported by some of the core git team
- Git Annex
- GitTorrent
- Git Fat
- Criteria:
* Reliability (should check out the actual file not just text of sha)
* Flexibility of transport mechanism
* Scalability and reprudicibiliity (scales up and down)
* Resource consuption
** CPU
Limit network IO
Limit disk IO
* Speed
* Could possibly simulate cross-datacenter limitations with tc
* eqiad - codfw RTT ~40ms, inside eqiad RTT is ~0.5ms
- Definitions of success
- Atomic failure modes
- Optional fanout proxies based on git-http(s)
- Blobs pused around with git annex or another solution (GitTorrent, large file store, figure out criteria for solution)
- Continuous feedback to user
- Verify integrity of repo
- Not insanely slow (any slower than current system)
ACTION: To use staging for deployment blob mechanisms
Here's a outline of one possible way to add git transport to the current scap system:
- Prep /srv/mediawiki-staging manually (just like today)
- Rsync from /srv/mediawiki-staging to /srv/mediawiki (just like today)
- Store state in local git repo on tin
- cd /srv/mediawiki
- git add --all #NEW!
- git commit -m "$SCAP_MESSAGE"
- $TAG="scap_$(date +%Y-%m-%dT%H:%M:%S)"
- git tag $TAG -m "$SCAP_MESSAGE"
- Update fanout servers:
- ssh $FANOUT
- [[ -d /srv/mediawiki ]] || ( mkdir -p /srv/mediawiki && cd /srv/mediawiki && git init && git remote add origin http://<deploy server>/mediawiki )
- cd /srv/mediawiki
- git fetch origin/$TAG
- git reset --hard origin/$TAG
- Update MW servers
- ssh $MW_SERVER
- UPSTREAM=pick closest fanout server like we do today for rsync
- [[ -d /srv/mediawiki ]] || ( mkdir -p /srv/mediawiki && cd /srv/mediawiki && git init )
- cd /srv/mediawiki
- git remote add origin http://$UPSTREAM/mediawiki
- git fetch origin/$TAG
- git reset --hard origin/$TAG
- Run final per-host steps (just like today)
@bd808: Just for the sake of argument, what if we created a tar.gz archive on tin, then transfer that with rsync? or even using something like https://github.com/jmacd/xdelta? From my tiny bit of testing, I think that might actually save a lot of disk i/o AND network bandwidth. But it's totally not git-friendly...
Also, fwiw, I really think BitTorrent is probably the way to go for the transport. Even if we don't do it now I think it's inevitable that we will have to do something like that one day not too far down the road. With multiple data centers we already could benefit from BitTorrent quite a lot I think.
It might be worth testing rsync of a tar archive. That will get rid of a ton of stat calls but it will then add blockwise comparison of the old and new tar archive. Rsync is not well known for being the best transfer mechanism for binary content but an uncompressed tar is mostly a linear text file ordered according to inode traversal of the input directory. The question to be tested is whether the blockwise diff algorithm over a largish tarfile is faster or slower than stat of the ~275K files that syncing /srv/mediawiki-staging hits.
I tend to agree that finding a performant transport layer for binary blobs that works well across multiple data centers is the best future for deployment tools. This will make shipping things like l10n files, hhbc caches and "containers" easier.
Rsync as a transport was a logical step when the prior shared NFS mounts were decommissioned. Git is another incremental step that really just optimizes rsync behavior. It does have the nice side effect of making state vs master server more discoverable and also enabling quick revert to a prior known state.
Something like the system described to us by Facebook last year that ships squashfs blobs via bittorrent is basically a "container" solution and could probably be generalized to a core product that is shared between MW and other services for shipping code/config/data with variation on what command & control actions happen on the target servers after the payload is delivered.
@bd808: Building on the squashfs idea: We now have a weekly branch that is deployed everywhere after just 3 days, so imagine:
Tuesday we push a squashfs with the branch state, then any hotfixes/swat deployments/etc could modify a slimmed down squashfs that contains just the files that changed since the branch was created. Then at any given time we have a unionfs on the application servers (and a different unionfs on static servers...) with two files: a base, created when the branch started on tuesday, plus an overlay which contains everything changed since tuesday. When we sync, we would only need to sync the overlay file and it would be largely the same as the previous version of the overlay file.
Rollbacks would just require saving the old version of the base and overlays for some period of time and swapping them out would be simple.
This is now the meta tracking bug for deployment tooling thoughts.
Meeting notes 6/15/15
Flexible components of the system
- Transport mechanism
- Version to deploy
- meaningful tags
- list different deploy versions
- Signaling restart
- service command
- HUP
- Testing at the end
Versioning
- Services use semantic versioning, but not for deployments.
- There is a task for making mediawiki follow semantic versioning as well.
- It would be nice to use a standard versioning scheme, and some naming conventions for deployment tags, rather than long numeric deployment numbers like we have in trebuchet.
- for phabricator I use a date based deployment tag like release/2015-06-10/1 where the /1 is a revision number, for hotfixes you just increment the tag
Control mechanism using ssh
- SSH for each host
- Public key deploy
- Sudoers roles
- troubleshooting deploys requiring escalation
- service use needs read/write (possibly)
interface ideas
- Tmux—lotsa feedback
- Ability to abort at any point
- Watching logs/backend
- start from alternative interface, attach if problems
- locking mechanism per repo (possibly global, not neccessarily)
- single point of updates, multiple consumers (e.g. redis consumed by web page and by commandline)
The initial discussion/planning for this was completed long ago and the services deploy MVP is already tracked in T109535: EPIC: Scap3 should implement the services team requirements. Let's close this out.