Page MenuHomePhabricator

Scap3 should support virtualenv for deployment of python packages
Open, MediumPublic

Description

Using virtualenv is vital to deploy python packages.

Event Timeline

The general approach to deploying services in WMF prod is to have a repo with the source code as well as its dependencies present. In other words, building dependencies on production hosts is not expected to happen. As an example, for node.js services we have two repositories: a source one containing the code, and a deploy repository with contains the former as a submodule and any extra dependencies needed by it. Concretely, see RESTBase's source repo and its deploy repo.

TL;DR create another repo containing the virtualenv and ORES. That way not only can Scap stay language-agnostic, but more importantly no build step will be performed on production hosts. That said, if the needed dependencies have their respective deb packages, then those can be installed on the hosts, but this needs to be specified in ops/puppet, not in the scap configuration.

Thank you for your detailed comment.
We have two repos one is the ores itself and the other one is ores-wikimedia-config which includes deployment configurations. We do have those dependencies included in a repo called wheels (in research/ores/wheels) but we need to install the files inside the virtualenv using *.whl files

So what do you suggest in this case?

Hi @mobrovac. We've been working with SRE on this from day 1 to make sure that we have a good strategy for deploying in prod. We're breaking new ground here with this type of python service. We looked at Deb packages, a compressed virtualenv, and finally binary python wheels. I've had @yuvipanda confirm that wheels are the best option (which I strongly agree with). It gives us the greatest flexibility for installing dependencies and requires no arbitrary code execution. I'm sorry you didn't know the history here, but it's important that you do before you make demands about how we should change our deployment process.

I am interested in deploying a venv as well. My use case is the Zuul server. Building the deb package and fulfilling python modules dependencies have been a pain.

I could use a workflow where I get to build the Venv based on a list of requirements, push that to a git repo (such as zuul/deploy.git ) then use scap to push it to the production hosts running Zuul (gallium, scandium currently).

I don't see why you would need specialized support for this, you just do like @hashar said and push the virtual env. into a git repo and deploy that with scap3.

Scap already has the ability to execute arbitrary commands after each stage of the deploy process using check commands. It should be possible to initialize the environment from those.

Scap already has the ability to execute arbitrary commands after each stage of the deploy process using check commands. It should be possible to initialize the environment from those.

Specifically, using https://github.com/wikimedia/research-ores-wheels + https://doc.wikimedia.org/mw-tools-scap/scap3/quickstart/setup.html#additional-checks may work for this use-case.

Something like:

checks:
  setup_virtualenv:
    type: command
    stage: promote
    command: [pip commands]

Would execute after the code + submodules are checked out, but before any service restart.

Does that cover the use-case?

let me give it a try.

One thing: it's probably better to rename "check" to "commands" or something else. + change the docs. I got the impression that these are only used to do pre-deploy checks (CI, etc.)

One thing: it's probably better to rename "check" to "commands" or something else. + change the docs. I got the impression that these are only used to do pre-deploy checks (CI, etc.)

Yeah, I've noticed that the semantics aren't especially helpful as the use-case for "checks" has expanded.

I put it in there. The whole thing is very untested. Also it would be good if we had several service name restart supported in a similar way that dsh_targets support several service groups, is that intentional? If you have plan to support it I would make the patch for you right away

I put it in there. The whole thing is very untested. Also it would be good if we had several service name restart supported in a similar way that dsh_targets support several service groups, is that intentional? If you have plan to support it I would make the patch for you right away

Hey Amir, please definitely fill another task for scap3 to support multiple services ;-}

Hi @mobrovac. We've been working with SRE on this from day 1 to make sure that we have a good strategy for deploying in prod. We're breaking new ground here with this type of python service. We looked at Deb packages, a compressed virtualenv, and finally binary python wheels. I've had @yuvipanda confirm that wheels are the best option (which I strongly agree with). It gives us the greatest flexibility for installing dependencies and requires no arbitrary code execution. I'm sorry you didn't know the history here, but it's important that you do before you make demands about how we should change our deployment process.

If you re-read my comment, you'll realise that I wasn't demanding anything, merely pointing out the way things are currently done and that if you follow that approach no special treatment would be needed. On that note, yes, I am personally against having a specialised deployment method for this. I think we can find a way to properly deploy python services in general.

I don't see why you would need specialized support for this, you just do like @hashar said and push the virtual env. into a git repo and deploy that with scap3.

+1. This is exactly what I'm advocating. The way you build the virtual env is the same, so saving that result in a git repo and checking that out on the target nodes should work just as well as executing pip install commands on each deploy on each target node.

To expand a bit on this, you could have your virtualenv built in a container which has all of the same packages and their respective versions as WMF production. This would ensure that all is working fine even if you have binary dependencies.

@mobrovac the point of using wheels vs standard pip is exactly not needing to build anything on the production hosts; it's akin to using tar or dpkg, in some way. You get a binary archive, and you deploy it to a special directory.

I am completely neutral to deploying wheels and assembling the virtualenv in-place on the production boxes, or to build it via some ci job (which would be better and safer than the deployer doing that in a container on a dev machine - those should be used for local testing if needed). I guess Heroku doing the former could be a hint it's not some batshit-crazy idea.

Then again, other people in ops have invested their time into this and I'd like to hear their opinion.

As for scap3, I supposed it could issue commands on the remote host post-deploy, and if it can't, we have to rethink a few things.

So, wheels!

Here were the options I considered and researched before deciding on using wheels:

  1. Debian packages. This is what we initially tried, and spent some time packaging all the dependencies. However, it became fairly quickly painfully obvious that this isn't going to scale, and I'm unwilling to do the required amount of (to me, fairly unrewarding) grunt work that is debian packaging and maintenance of it. Going from 'a requirements.txt file was updated' to 'here, it will get installed by puppet' was just too much work, so this was discarded. There were also additional issues, such as always requiring work from a root for every deployment that changed any library version (not sustainable at all in long run) and the fact that some packages were a total PITA to package (thank you, gfortran / cython!)
  2. Shipping a virtualenv. We could've done this in a git repository. However, this has the problem that virtualenvs aren't path independent at all. There's the --relocatable option that you can use, but this is a pretty ugly hack, and you've to run it every time you update the virtualenv. Due to the fact that the underlying hackiness of this approach, this would probably cause issues when you are least expecting it to (and has in the past), so relynig on a hack for distribution doesn't seem like the nicest approach.
  3. Ship virtualenv in something like Pex (https://github.com/pantsbuild/pex). This is what twitter does, and is a fairly attractive option. However, this is also fairly heavy weight - any change in the code or dependencies requires rebuilding a fairly big binary and redistributing it, which kinda sucks given the way we deploy things.
  4. Use binary wheels. Binary wheels are a PEP standard (https://www.python.org/dev/peps/pep-0427/) that have been around for a few years now. These allow fully reproducible builds that do not require internet access, and are automatically produced (unlike debs). The built artifact is extremely similar in concept to a .deb debian package, except python specific and *automated*. This also does not have any of the problems presented by shipping virtualenvs around.

To summarize, wheels were chosen because:

  1. They are a python standard that more and more libraries and communities are adopting (http://pythonwheels.com/)
  2. They are built exactly for this kind of use case (deploying a known, reproducible environment without access to the wider internet)
  3. They are an automated solution that requires no manual packaging work.

I also want to point out that saying 'we have a standard way of deploying services and that is vendoring' is not particularly relevant, since:

  1. Those are all nodejs services, where you have very limited options to deploy without access to the internet (There is no wheel equivalent)
  2. While PHP too does vendoring, it is done in a totally different way and naming convention than node does.
  3. What works for nodejs will certainly not be the exact same thing that works for other languages. Imagine if Ops tells the services team that we've a standard way of deploying dependencies (debian packages!) and you must use them before you can deploy :D I don't think that's fair, and I think the same standard applies here.

This is the first python based service we're deploying, and after a lot of work we've come to the current conclusion that wheels are the best shot we have now. This will evolve as time progresses, and we can change if required.

Until then, I hope we can go back to figuring out how to do this with scap3 rather than a meta discussion about using wheels or not :)

+1 for wheels, it seems like the right tool.

I'm not sure what the difficulty is? Scap3 can indeed run remote commands on the target hosts and I can't think of any reason it wouldn't work with wheels.

+1 for wheels, it seems like the right tool.

I'm not sure what the difficulty is? Scap3 can indeed run remote commands on the target hosts and I can't think of any reason it wouldn't work with wheels.

I confirm that I've been able to build virtualenv and install packages using wheels in target hosts. It would be better for scap3 to support it but since I did it with checks, no strong feelings.

mmodell moved this task from Needs triage to Services improvements on the Scap board.