Experiment with continuous deployment using Blubberoid
Open, Stalled, MediumPublic
Actions

Assigned To

None

Authored By

	• LarsWirzenius
	Jan 18 2019, 12:09 PM

Description

The Release Engineering team wants to experiment with continuous deployment of services, automatically from Gerrit (after a +2 code review) to a container in Kubernetes. We've chosen the Blubberoid as the service to experiment with. This task is the umbrella task for that. Parts of the work will be added here as sub-tasks.

The acceptance criteria for this is: after a RelEng team member with +2 rights in Gerrit votes +2 on a change for Blubberoid, it gets built, tested, and deployed to Kubernetes fully automatically, within five minutes.

Details

	Subject	Repo	Branch	Lines +/-
	[WIP] Apply global helmfile after pull	operations/puppet	production	+19 -1
	Helmfile for continuous deployment	operations/deployment-charts	master	+10 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Stalled		None	T214158 Experiment with continuous deployment using Blubberoid
		Declined		None	T217147 Add k8s credentials for Blubberoid continuous deployment

Event Timeline

• LarsWirzenius created this task.Jan 18 2019, 12:09 PM

Restricted Application added a project: Release-Engineering-Team (Kanban). · View Herald TranscriptJan 18 2019, 12:09 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think we should start by looking at the steps from a developer pushing a new change to Gerrit. This should trigger Zuul to run a Jenkins job that builds Blubberoid, runs its unit tests, and builds, publishes, and tags the Docker image(s). Then things stall until a reviewer votes +2 for code review. That will trigger Zuul to run another Jenkins job, which deploys the docker image previously tagged to Kubernetes, and changes that K8s or load balancer configuration to use containers using that new image instead of the previously used image.

We'll want to have a manually triggered rollback that somehow takes the new image from use, and makes the previous image to be used instead.

I'm unsure as to the details, but does this approach seem sensible overall? Which parts do we have already that we can reuse?

greg added a project: Release Pipeline.Jan 18 2019, 6:59 PM

Paladox subscribed.Jan 18 2019, 7:01 PM

Reedy renamed this task from Experiment with continuous deployment using Blubberoid to Experiment with continuous deployment using Blubberoid.Jan 18 2019, 7:33 PM

This is a first rough draft of an outline of a continuous delivery (CDep) pipeline for the Blubberoid service, to be a starting point for discussions. The goal is to make everything from a +2 code review vote until the change runs in production to be fully automated. Note that this is for Blubberoid ONLY, and for deploying it to Kubernetes ONLY.

PUSH: Developer pushes patch (or set of patches) to gerrit as refs/for/master. Gerrit creates an entry in its database to track the change. Potential reviewers are added and notified.

This should work already and require no changes.

BUILD: Zuul notices a new patch set, and triggers a job to build the patch set and run any unit tests and other tests that can be run from the build tree. (FIXME: does this need to be specified in more detail? should a Docker image be built as well at this stage?). The job runs on a Jenkins worker. If the job fails, Jenkins notifies Gerrit, which records a -1 verification vote on the change (in this case the pipeline fails and stops, but can be restarted if the developer pushes a new patch set).

Except for details of the jobs that gets triggered, this should already work.

REVISE: Time passes. Reviewers may request changes. Developer may push new versions of the patch set, which start the pipeline from the beginning.

As far as I understand, this already works.

APPROVE: A suitable authorized reviewer votes +2 for code review. Zuul notices the +2 vote and triggers a job that merges the change to master, rebuilds, retests, builds Docker images, and publishes the images in the image store, tagged suitably.

As I understand, this should work, except maybe for the details of the job.

DEPLOY: Zuul also triggers a job (or the same job continues with this) to deploy the built docker image to production Kubernetes and to migrate traffic to containers running the new image.

This is new. We need a Jenkins worker (or a whole new Jenkins instance?) which is enabled to build to production. What permissions do we need for that?

(I've labelled the steps so we can refer to them easily.)

Unless there is some part I am missing, I don't think we'll need to be changing any k8s configuration regarding load balancers, etc.

If we use helm to upgrade the release as it appears we do during deployments currently, there won't be a previous deployment that we'll need to manage. Then, in the event of a rollback, we can use helm to roll back to the previous revision. That involves knowing which revision we would like to roll back to, which we can use helm history to find out.

My understanding is that this is missing the token for Jenkins to be able to push a deployment to K8s.

greg triaged this task as Medium priority.May 9 2019, 11:49 PM

greg edited projects, added Release-Engineering-Team-TODO (201907); removed Release-Engineering-Team (Kanban).Jul 1 2019, 9:25 PM

greg moved this task from INBOX to Ready on the Release-Engineering-Team-TODO (201907) board.Jul 1 2019, 9:27 PM

greg edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team-TODO (201907).Jul 6 2019, 4:34 AM

greg moved this task from Should be empty (use Release-Engineering-Team) to Next on the Release-Engineering-Team-TODO board.Jul 6 2019, 5:40 AM

greg added a project: Release-Engineering-Team (Pipeline).Aug 1 2019, 11:20 PM

jeena mentioned this in T207535: Rendering of \oinit very dense.Sep 2 2020, 6:25 PM

Physikerwelt awarded a token.Sep 2 2020, 6:32 PM

MSantos subscribed.Sep 24 2020, 2:28 PM

jeena mentioned this in T266694: Continuous delivery for kubernetes services.Oct 28 2020, 5:58 PM

jeena claimed this task.Jan 27 2021, 12:45 AM

Change 634354 had a related patch set uploaded (by Jeena Huneidi; owner: Jeena Huneidi):
[operations/deployment-charts@master] Helmfile for continuous deployment

https://gerrit.wikimedia.org/r/634354

gerritbot added a project: Patch-For-Review.Jan 27 2021, 12:51 AM

Change 658750 had a related patch set uploaded (by Jeena Huneidi; owner: Jeena Huneidi):
[operations/puppet@production] [WIP] Apply global helmfile after pull

https://gerrit.wikimedia.org/r/658750

Change 658750 abandoned by Jeena Huneidi:
[operations/puppet@production] [WIP] Apply global helmfile after pull

Reason:
After discussion in the pipeline repo, it's indeed not secure enough to trust a 2 from gerrit. It would also be best to have an automatic rollback strategy before implementing continuous delivery.

https://gerrit.wikimedia.org/r/658750

thcipriani mentioned this in T274901: Stop using puppet + git pull for auto deployment of schema repos.Feb 22 2021, 3:57 PM