Page MenuHomePhabricator

Rethink beta scap deployment
Open, Stalled, LowPublic

Description

tl;dr: the scap release process is manual and fiddly and it has recently failed quite a bit, so it's probably time to rethink it.

Currently, scap is deployed to the deployment-prep cluster via a debian package built from the master branch of scap. Once scap devs are happy with that version, master is merged into the release branch where we cut a new debian package for production.

Having two branches with debian folders (master and release) has caused packaging confusion (T183046). Not (manually) bumping the version in master at the same time we upload a new package to production from the release branch causes deployment-prep puppet breakage (T184118). Having the beta package built post-merge has let some scap bugs escape and break other folks workflow in deployment-prep (T184176). We should try to automate and simplify as much of this as possible in light of recent breakage.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 10 2018, 5:11 PM

Looking at the phabricator-jessie-commits job on T184118#3897095 I had to refresh my memory on what was happening.

Current process

As I understand it

  1. A commit is pushed to a repo that is tagged in differential with ci-meta-jessie
  2. Harbormaster Plan 9 (heh) is triggered
  3. Phab makes a POST request to Jenkins triggering the https://integration.wikimedia.org/ci/job/phabricator-jessie-commits/ job with the following params
    • PHID=${target.phid}
    • CLONE_URI=${repository.clone.uri}
    • CHECKOUT_REVISION=${repository.clone.ref}
    • CALLSIGN=${repository.callsign}
    • OFFLINE_NODE_WHEN_COMPLETE=1
  4. Once that job is complete the job https://integration.wikimedia.org/ci/job/phabricator-jessie-debs/ is triggered which builds the package for beta and uploads to a repository in beta.
thcipriani triaged this task as Medium priority.Jan 12 2018, 5:22 PM

Things I like

  • being able to run master in beta
  • cutting production releases from a branch that isn't as in-flux as the master branch

Things I'd like to change

  • Get rid of the debian folder in the master branch (since it causes confusion and has manual upkeep)
  • Would be nice to have a pre-merge e2e test of basic scap functionality (likely in beta)

If we ditch the debian folder in master we need to figure out a new way to deploy to beta. Ideas off the top of my head: scap deployed via git, scap deployed via scap, merge master into release and push out a new release deb to beta.

Of those 3 ideas, the first two might allow an easy path forward for an e2e test in beta, but maybe we can figure out something else for that...

We still have to address the problem with deploying scap to beta: commits to master are disruptive to other developers who use beta to test their deployments. They need a stable scap while we need a way to test the bleeding-edge scap. I don't know how to address both needs without a drastic departure from the current setup.

And that's why my puppetmaster manifests go to a local puppetmaster instead of the main deployment-prep one. Having everyone spin up their own puppetmaster isn't the answer, there's probably a clever approach for this entire class of problems.

  1. Removal of the /debian in master.
    • I think it makes the most sense to have the ci job merge master release and then build packages from that.
  2. Pre-merge end to end tests would be wonderful, however, doing this in deployment-prep without it being disruptive to other developers is tricky.
    • We could do something with docker or a dedicated target instance that tracks master while keeping the release version of scap installed everywhere else
      • This would require either a dedicated scap master for our testing, or a separate scap install in a private path which we use to run our tests.

@ArielGlenn: indeed, I'm open to clever suggestions ;)

Can we just use docker-compose to define a whole scap micro-cluster and then run that in CI?

demon added a comment.Jan 12 2018, 6:08 PM

We still have to address the problem with deploying scap to beta: commits to master are disruptive to other developers who use beta to test their deployments.

We could stop breaking master ;-)

@demon: there are a whole class of bugs that we can't test until code hits master, currently....

demon added a comment.Jan 12 2018, 6:11 PM

We don't disagree: but my point is that if we're merging stuff to master that is risky, we (the one doing the merge) should be prepared to either fix things or roll back quickly. Basically: adopt the "master must always be runnable" adage we use for MediaWiki & friends.

@demon: That complicates development (and raises my stress level) while failing to fully address the issues with CI not actually catching bugs and release branch versions conflicting with master versions.

Also, the "master must always be runnable" idea sounds great but it transfers responsibility from CI to a manual process that developers are responsible for doing properly and consistently. I'd rather have an automated process than a procedure outlined in 12 steps on a wiki page somewhere.

demon added a comment.Jan 12 2018, 6:19 PM

I think developers should be responsible for the code they merge and not rely on CI to catch everything. This doesn't preclude good CI, but I don't think "be ready to revert your code if it breaks things" is really a big ask--and it's something I'd hope we're doing anyway...

That's entirely beside the point, however. The problem is currently the route to finding out if the code is broken is through commits to master which also break other people's workflow. This is obviously bad.

Maintaining a local VM for testing is reasonable, however, it's a non-trivial amount of work and IMO it would be more efficient to have a shared testing environment that we can all use rather than each of us maintaining a local test vm. I am with @thcipriani regarding e2e testing in deployment-prep. We especially need that for scap releases prior to deploying them to production.

mmodell claimed this task.Jan 18 2018, 1:03 PM
mmodell moved this task from Backlog to In-progress on the Release-Engineering-Team (Kanban) board.
mmodell moved this task from Needs triage to Debt on the Scap board.Feb 1 2018, 12:21 AM
mmodell changed the task status from Open to Stalled.Mar 29 2018, 6:49 PM
mmodell lowered the priority of this task from Medium to Low.
mmodell removed mmodell as the assignee of this task.Apr 23 2018, 4:29 PM