Page MenuHomePhabricator

Gerrit/Argo CI proof of concept
Closed, ResolvedPublic

Description

Set up a working instance of Argo Events/Workflow/UI to demonstrate/evaluate its feasibility as a CI replacement. It should at the very least:

  1. Respond to Gerrit patch-created and ref-updated events (Argo Events).
  2. Trigger CI workloads based on configuration stored in the project repo (Argo Workflow).
  3. Store build artifacts (Minio for now).
  4. Surface build output/history via a user facing UI (Argo UI).
  5. Surfaces inner processes and logging via an admin interface (k8s API).
  6. Report back to Gerrit upon completion of workloads (???, possibly Argo Events, possibly straight from Workflow process).

Argo 2019 PoC Design.png (720Ă—960 px, 116 KB)


Summary of PoC findings

Overview

Argo comprises a few different projects that have well defined concerns and would work together to provide a fully functional CI system, namely Argo Workflow for task definition/execution, Argo Events for handling of events from external and the internal system, and Argo UI that surfaces details about workflow history and progress, and artifacts. Similar to other "cloud-native" systems (e.g. Tekton), it works by defining a number of Kubernetes Custom Resource Definitions (CRDs) and controllers installed to a k8s cluster that respond to CRD changes.

Three different systems were set up to complete the PoC, all installed to a single node Google Kubernetes Engine cluster:

  1. Argo (core) for workflow execution
  2. Argo Events for Gerrit integration
  3. Minio for S3-compatible artifact storage

Recommendation

Argo would make for a flexible/powerful/reliable CI system for WMF projects granted:

  • It continues to be well maintained.
  • WMF continues to invest in Kubernetes as a platform.
  • Adequate k8s capacity, either from SRE internally, or form an external provider.
  • Release Engineering invests in Go.

User experience

Watch the asciinema tty-cast to see how a user might interact with Argo on the command line. (See Argo documentation for web UI examples.)

Pros

  • Argo is well designed. Its subsystems are well scoped, flexible, and have relatively small footprints.
  • Installation/configuration is fairly straightforward.
  • Workflow steps are defined as container templates which means they can execute anything that can be supplied as a container image.
  • Workflows support either serial or DAG execution of steps, potentially allowing us to implement current pipelinelib functionality on Argo.
  • Argo allows us to delegate CI job definition to repo owners in a few different ways, either exposing Workflow definitions directly, or through some higher level abstraction—like pipelinelib’s .pipeline/config.yaml.
  • Argo is a "k-native" system. All the underlying process scheduling is done by Kubernetes which makes for fewer moving parts in Argo itself.
  • Being k-native also means inheriting the k8s standard toolchain (kubectl) which could be a benefit to both users and admins that are already familiar with or are learning k8s. Having system interfaces that are more broadly understood outside of the team seems like a good thing.
  • Being k-native it is designed to be deployed via Kubernetes, a platform we’re already investing in and have been directly targeting in Deployment Pipeline work.
  • Integration with Gerrit is possible in a few different ways, via webhooks, Kafka, or an Argo Events Gateway plugin. There would be pros/cons to each but having options is generally a good thing, and any of these options seem sufficient.
  • Commenting back to Gerrit is already possible using Argo Events and moreover the latter is a flexible system, allowing us to easily set up system event handlers for reporting, notifications, etc.
  • Artifact support is pluggable and transparent—Argo UI provides a proxied link for downloading artifact files. For this PoC I used Minio but other systems are supported.
  • Argo seems well supported and maintained. I had a positive experience with upstream in both their Slack channel for support and via GitHub for contribution—of 4 and 1 patches to argo-events and argo (workflow) respectively.
  • The extent of Argo’s user base is nowhere near that of GitLab but seems decent—34 companies, many of them high profile.
  • The documentation is pretty good.

Cons

  • These k-native things are very new.
  • Although my experience interacting with upstream was very positive, the project lacks a diversity of core maintainers—its main two contributors are employees of Intuit.
  • Similarly, argo-events has just one or two core contributors—different than argo itself.
  • Git support in both argo and argo-events was lacking the ability to fetch and checkout specific refs which originally made checking out Gerrit patchsets impossible. There were also some weird inefficiencies around git clones identified in code. I wrote a handful of PRs which implemented the missing features and optimized clones. I’m not sure this is a con exactly, as contribution to upstream is an important part of an open-source software user relationship. However, it does beg the question: what else might we have to implement ourselves in order to achieve a fully useable CI system?
  • Argo Events is an esoteric system. Its gateway/sensor/circuit concepts/resources took a lot of reading about before I could get started in setting up the Gerrit integration.
  • Argo, like GitLab, is lacking Zuul’s dependent pipeline logic and merger. We’ll have to implement some sort of controller for these these features if we go with Argo.
  • Documentation could be better.

Event Timeline

dduvall triaged this task as Medium priority.Jul 29 2019, 4:34 PM
dduvall moved this task from Backlog to CI on the Release Pipeline board.
dduvall moved this task from INBOX to Doing on the Release-Engineering-Team-TODO (201908) board.

I'm still working on a thorough summary of this PoC setup, but here's a quickstart on how users might interact with it—on the command line.

https://asciinema.org/a/267414

Transcript (without CLI output of course):

# i'm currently in a blubber test repo cloned from gerrit.git.wmflabs.org
git remote -v

# this gerrit repo is configured to submit patchset-created events to argo
# via the webhooks plugin
git fetch origin refs/meta/config:refs/meta/config
git show refs/meta/config:webhooks.config

# currently argo is set up to execute tasks as defined by the repo
# in .pipeline/argo-workflow.yaml ...
# this is a convention set up for this proof of concept
# there are many other possibilities
vim -R .pipeline/argo-workflow.yaml

# alright. time to submit a patchset and see what happens
# i'll be using both `argo` and `kubectl` commands to see what's going on
# there is a web ui but this is a console :)
# let's go!
git add .pipeline/argo-workflow.yaml
git commit -m 'testing argo'
git push origin HEAD:refs/for/master

# ci workflow should already be running in a project-dedicated namespace
# to which users could potentially have full access
# again, this is flexible but made sense for the initial poc

# ok. now we can use the `argo` command to check out progress and history
argo --namespace blubber list
argo -n blubber watch {id}
argo -n blubber logs {id}

# an argo workflow is a k8s custom resource
# and like any resource it can be interrogated with kubectl
kubectl -n blubber get workflow {id} -o yaml | vim -R -c 'set ft=yaml' -

# workflows spawn k8s pods to do the work
kubectl -n blubber logs {id} -f --all-containers

# i'll check the status of the workflow one more time using `argo get`
argo -n blubber get {id}

# it finished!
# argo should have already commented on the change in gerrit
# i'll look using `gruf`, a gerrit cli
git log -1
gruf -t comments query change:{id}

# indeed! argo has commented with a link to the argo ui
# and a direct link to the artifacts created by the workflow
# the reporting was actually done via another workflow
# this time kicked off in a protected argo-events namespace
kubectl -n argo-events get workflows
kubectl -n argo-events get workflows {id} -o yaml | vim -R -c 'set ft=yaml' -

# that's all! again, there are many possible setups using argo
# it's quite flexible and a little weird. :)