Page MenuHomePhabricator

Remove apache dependency from scap3 deployment host
Closed, DeclinedPublic

Description

Instead of fetching configs and code from a statically configured apache server on tin, we want to try using a server instance for each deployment session, rooted in the deployment repo and potentially supporting git smart http protocol.

Things to take into consideration:

  • firewall rules would likely block the randomly assigned port on the deployment host
    • @dduvall suggested we use ssh tunnels to side-step this problem, which would have the added benefit of encrypting the connection so that we can avoid https certificate complexities.
  • an extra server framework is an added dependency
    • I implemented a proof of concept patch (D20) using twisted, which is a non-trivial dependency.
    • @dduvall pointed out that we could potentially get a benefit from supporting smart git protocol over http, there is a python implementation at https://pypi.python.org/pypi/turnip which needs to be evaluated.
    • The server doesn't have to be written in python, and it may be better if it runs in a separate process from the main scap deployment control flow.
  • Decoupling the deployment from /srv/deployment on tin, and from apache's static config, significantly expands the usefulness of the tool in general as it allows for ad-hock deployments, e.g. from a developer's workstation directly to a labs instance or a vagrant vm. This would be a big improvement to development and testing workflow and eliminates some slightly complex configuration burden which developers shouldn't have to deal with.
  • We should also consider scalability (the fan-out deployment scenario)
    • Deployment targets could act as proxies by running their own instance of the http service.
    • Eventually we could plug in bittorrent

Revisions and Commits

Event Timeline

mmodell claimed this task.
mmodell raised the priority of this task from to Needs Triage.
mmodell updated the task description. (Show Details)
mmodell added a project: Scap.
mmodell lowered the priority of this task from Medium to Low.Nov 9 2015, 6:23 PM

@thcipriani and I discussed this on the deployment triage call today.

We resolved to explore the idea of storing the deployment state entirely in the git repo, probably in a branch. This can eliminate the apache dependency and simultaneously resolve T113072: Make puppet provider for scap3, allowing us to get the initial state correct on newly provisioned servers.

Simply assuming the scap/ directory is contained in the deploy repository should be enough. Currently, the services' deploy repositories have the following outline:

deploy
  |-- src/
  |-- node_modules/
  +-- package.json -> src/package.json

So, including scap/ on the same (root) level, would get the config and other stuff inside it automatically available on the targets. As a bonus, that would mean that any changes to the scap/ directory would make for a new commit, meaning that Scap3 would need to track only one hash, not two (one for the repo, the other for the config).

So the idea would be, instead of using the .git/DEPLOY_HEAD file on tin as the canonical latest-deployment for a particular repo, we would simply make a commit to an orphan branch that is pushed upstream as part of a deploy. This file could be used by deploy-local as configuration when fetching from the deploy_host as well as by the future scap puppet provider.

Creating an orphan branch in which to track deployment state would have the benefits:

  1. Less intrusive than committing to a repo's deploy branch: no imposing directory structure which may cause collisions.
  2. Keeps commit history for a branch clean, automatic changes from deploy tooling wouldn't end-up in the deploy repository.
  3. Keeps commit history for the orphan branch clean: should be easy to look through a history of deployments by checking out the orphan branch, rather than having to sort through un-related development history.
  4. Keeps the tooling agnostic about how a repo should include the scap directory inside the repository: as a subtree, as a submodule, as a .gitignore'd directory, scap/scap.cfg just has to exist.

An orphan branch also is fairly VCS agnostic (unlike git-tag or git-notes).

That was my thinking in the discussion we had this morning.

mmodell raised the priority of this task from Low to High.Nov 30 2015, 5:20 PM

From meeting notes 2015-11-30:

One thing of note, is that for production, it is important to use tin for the git_server from which information is fetched and we should enforce that in scap code. tin has much less surface area than using anything and/or everything as an available upstream.

For smaller instances, and to support re-use in labs projects, having a configurable upstream to which we could push a branch with canonical configuration information would get rid of the dependency of apache.

Looks like this is a wontfix due to ops imposing requirements that we only deploy from git repos hosted on the deployment server.

Ok after discussing this with the team, it's still something we want to support, it just won't be the default mode of operation.

I didn't really mean to close this - I was confused.

mmodell lowered the priority of this task from High to Medium.Jan 5 2016, 8:33 PM

I'm thinking we could start a SimpleHTTPServer rooted at /srv/deployment.

@thcipriani: Is there any way to override the http port that deploy-local uses to connect back to the deployment host?

@mmodell you could include it as part of the git_server argument, e.g., git_server = tin.eqiad.wmflabs:8080.

What changed to point you at evaluating SimpleHTTPServer? Initially I remember some discussion about how that wasn't the right tool for this particular task.

@thcipriani: We were talking about completely replacing the apache server on tin / mirra with a python http implementation. For that task, SimpleHTTPServer almost certainly would not scale. On the other hand, now that we are sure that we need to keep the apache server in order to support the puppet provider, it seems a lot easier to just use a simple http server for labs and one-off deployments from a developer's machine. The only immediate benefit to replacing apache really is a bit less configuration complexity. The reality though, is that our apache configuration is already built in puppet so it's only a theoretical savings. If apache was unreliable or otherwise deficient then it'd be worth the effort.

The more performant python http servers are rather heavyweight dependencies to add and non-trivial amount of code to implement a decent configurable setup. I'm not against implementing something better but I think that a simple server will work for the most basic use-case and we can build something better if there is a demand for it in the future.

mmodell raised the priority of this task from Medium to High.Mar 6 2016, 3:30 PM

One thing to investigate further is the use of git daemon.

mmodell lowered the priority of this task from High to Medium.
mmodell added a subscriber: mmodell.

I don't think this is really the direction we were planning to go in anymore.