Rethink beta scap deployment
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	thcipriani
	Jan 10 2018, 5:11 PM

Description

tl;dr: the scap release process is manual and fiddly and it has recently failed quite a bit, so it's probably time to rethink it.

Currently, scap is deployed to the deployment-prep cluster via a debian package built from the master branch of scap. Once scap devs are happy with that version, master is merged into the release branch where we cut a new debian package for production.

Having two branches with debian folders (master and release) has caused packaging confusion (T183046). Not (manually) bumping the version in master at the same time we upload a new package to production from the release branch causes deployment-prep puppet breakage (T184118). Having the beta package built post-merge has let some scap bugs escape and break other folks workflow in deployment-prep (T184176). We should try to automate and simplify as much of this as possible in light of recent breakage.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		dancy	T184628 Rethink beta scap deployment
		Resolved		jnuche	T184118 scap package installed by CI breaks apt and thus puppet

Event Timeline

thcipriani created this task.Jan 10 2018, 5:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 10 2018, 5:11 PM

thcipriani mentioned this in T184118: scap package installed by CI breaks apt and thus puppet.Jan 12 2018, 5:18 PM

thcipriani added a subtask: T184118: scap package installed by CI breaks apt and thus puppet.

Looking at the phabricator-jessie-commits job on T184118#3897095 I had to refresh my memory on what was happening.

Current process

As I understand it

A commit is pushed to a repo that is tagged in differential with ci-meta-jessie
Harbormaster Plan 9 (heh) is triggered
Phab makes a POST request to Jenkins triggering the https://integration.wikimedia.org/ci/job/phabricator-jessie-commits/ job with the following params
- PHID=${target.phid}
- CLONE_URI=${repository.clone.uri}
- CHECKOUT_REVISION=${repository.clone.ref}
- CALLSIGN=${repository.callsign}
- OFFLINE_NODE_WHEN_COMPLETE=1
Once that job is complete the job https://integration.wikimedia.org/ci/job/phabricator-jessie-debs/ is triggered which builds the package for beta and uploads to a repository in beta.

thcipriani triaged this task as Medium priority.Jan 12 2018, 5:22 PM

Things I like

being able to run master in beta
cutting production releases from a branch that isn't as in-flux as the master branch

Things I'd like to change

Get rid of the debian folder in the master branch (since it causes confusion and has manual upkeep)
Would be nice to have a pre-merge e2e test of basic scap functionality (likely in beta)

If we ditch the debian folder in master we need to figure out a new way to deploy to beta. Ideas off the top of my head: scap deployed via git, scap deployed via scap, merge master into release and push out a new release deb to beta.

Of those 3 ideas, the first two might allow an easy path forward for an e2e test in beta, but maybe we can figure out something else for that...

We still have to address the problem with deploying scap to beta: commits to master are disruptive to other developers who use beta to test their deployments. They need a stable scap while we need a way to test the bleeding-edge scap. I don't know how to address both needs without a drastic departure from the current setup.

And that's why my puppetmaster manifests go to a local puppetmaster instead of the main deployment-prep one. Having everyone spin up their own puppetmaster isn't the answer, there's probably a clever approach for this entire class of problems.

Removal of the /debian in master.
- I think it makes the most sense to have the ci job merge master release and then build packages from that.
Pre-merge end to end tests would be wonderful, however, doing this in deployment-prep without it being disruptive to other developers is tricky.
- We could do something with docker or a dedicated target instance that tracks master while keeping the release version of scap installed everywhere else
  - This would require either a dedicated scap master for our testing, or a separate scap install in a private path which we use to run our tests.

@ArielGlenn: indeed, I'm open to clever suggestions ;)

Can we just use docker-compose to define a whole scap micro-cluster and then run that in CI?

In T184628#3897327, @mmodell wrote:

We still have to address the problem with deploying scap to beta: commits to master are disruptive to other developers who use beta to test their deployments.

We could stop breaking master ;-)

@demon: there are a whole class of bugs that we can't test until code hits master, currently....

We don't disagree: but my point is that if we're merging stuff to master that is risky, we (the one doing the merge) should be prepared to either fix things or roll back quickly. Basically: adopt the "master must always be runnable" adage we use for MediaWiki & friends.

@demon: That complicates development (and raises my stress level) while failing to fully address the issues with CI not actually catching bugs and release branch versions conflicting with master versions.

Also, the "master must always be runnable" idea sounds great but it transfers responsibility from CI to a manual process that developers are responsible for doing properly and consistently. I'd rather have an automated process than a procedure outlined in 12 steps on a wiki page somewhere.

I think developers should be responsible for the code they merge and not rely on CI to catch everything. This doesn't preclude good CI, but I don't think "be ready to revert your code if it breaks things" is really a big ask--and it's something I'd hope we're doing anyway...

That's entirely beside the point, however. The problem is currently the route to finding out if the code is broken is through commits to master which also break other people's workflow. This is obviously bad.

Maintaining a local VM for testing is reasonable, however, it's a non-trivial amount of work and IMO it would be more efficient to have a shared testing environment that we can all use rather than each of us maintaining a local test vm. I am with @thcipriani regarding e2e testing in deployment-prep. We especially need that for scap releases prior to deploying them to production.

• mmodell claimed this task.Jan 18 2018, 1:03 PM

• mmodell moved this task from Backlog to In-progress on the Release-Engineering-Team (Kanban) board.

• mmodell moved this task from Needs triage to Debt on the Scap board.Feb 1 2018, 12:21 AM

• mmodell moved this task from In-progress to Backlog on the Release-Engineering-Team (Kanban) board.Feb 26 2018, 5:33 PM

• mmodell changed the task status from Open to Stalled.Mar 29 2018, 6:49 PM

• mmodell lowered the priority of this task from Medium to Low.

• mmodell removed • mmodell as the assignee of this task.Apr 23 2018, 4:29 PM

greg edited projects, added Release-Engineering-Team (Backlog); removed Release-Engineering-Team (Kanban).Jul 13 2018, 10:13 PM

• Phabricator_maintenance edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team (Backlog).Jun 12 2019, 11:53 PM

• Phabricator_maintenance moved this task from Should be empty (use Release-Engineering-Team) to Later / Need volunteer on the Release-Engineering-Team-TODO board.Jun 12 2019, 11:55 PM

greg added a project: Release-Engineering-Team.Jun 21 2019, 10:35 PM

greg edited projects, added Release-Engineering-Team (Deployment services); removed Release-Engineering-Team.Jun 24 2019, 9:20 PM

thcipriani removed a project: Release-Engineering-Team (Deployment services).Apr 20 2021, 1:10 AM

thcipriani edited projects, added Release-Engineering-Team (thcipriani-workboard-fiddling); removed Release-Engineering-Team-TODO.Apr 20 2021, 3:42 AM

thcipriani moved this task from thcipriani-workboard-fiddling to Seen (ARCHIVE) on the Release-Engineering-Team board.Apr 20 2021, 3:58 AM

thcipriani edited projects, added Release-Engineering-Team; removed Release-Engineering-Team (thcipriani-workboard-fiddling).

thcipriani edited projects, added Release-Engineering-Team (Seen); removed Release-Engineering-Team.Apr 20 2021, 3:23 PM

Many changes have happened since this ticket was created.

Today, any time a change is merged into scap, a CI job runs which builds and publishes a new deb package to an apt repo which is accessible to beta nodes. Deploying one of these autocreated debs is a matter of running release-scripts/update-scap-in-beta. It will figure out the latest deb, prompt the user for conformation, and update the scap deb on relevant target hosts.

jnuche closed subtask T184118: scap package installed by CI breaks apt and thus puppet as Resolved.Aug 3 2022, 12:20 PM

Rethink beta scap deploymentClosed, ResolvedPublicActions