Set up a working instance of Argo Events/Workflow/UI to demonstrate/evaluate its feasibility as a CI replacement. It should at the very least:
1. Respond to Gerrit `patch-created` and `ref-updated` events (Argo Events).
2. Trigger CI workloads based on configuration stored in the project repo (Argo Workflow).
3. Store build artifacts (Minio for now).
4. Surface build output/history via a user facing UI (Argo UI).
5. Surfaces inner processes and logging via an admin interface (k8s API).
6. Report back to Gerrit upon completion of workloads (???, possibly Argo Events, possibly straight from Workflow process).
== Summary of PoC findings ==
=== Overview ===
Argo comprises a few different projects that have well defined concerns and would work together to provide a fully functional CI system, namely Argo Workflow for task definition/execution, Argo Events for handling of events from external and the internal system, and Argo UI that surfaces details about workflow history and progress, and artifacts. Similar to other systems (e.g. Tekton), it works by defining a number of [[https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/|Kubernetes Custom Resource Definitions (CRDs)]] and controllers installed to a k8s cluster that respond to CRD changes.
Three different systems were set up to complete the PoC, all installed to a single node Google Kubernetes Engine cluster:
# [[https://github.com/argoproj/argo|Argo]] (core) for workflow execution
# [[https://github.com/argoproj/argo-events|Argo Events]] for Gerrit integration
# [[https://github.com/minio/minio|Minio]] for S3-compatible artifact storage
=== Recommendation ===
Argo would make for a flexible/powerful/reliable CI system for WMF projects granted:
* It continues to be well maintained.
* WMF continues to invest in Kubernetes as a platform.
* Adequate k8s capacity, either from SRE internally, or form an external provider.
* Release Engineering invests in Go.
=== User experience ===
Watch the [[https://asciinema.org/a/267414|asciinema tty-cast]] to see how a user might interact with Argo on the command line. (See Argo documentation for web UI examples.)
=== Pros ===
* Argo is well designed. Its subsystems are well scoped, flexible, and have relatively small footprints.
* Installation/configuration is fairly straightforward.
* Workflow steps are defined as container templates which means they can perform just about anything that can be supplied as a container image.
* Workflows support either serial or DAG execution of steps, potentially allowing us to implement current pipelinelib functionality on Argo.
* Argo allows us to delegate ci job definition to repo owners in a few different ways, either exposing `Workflow` definitions directly, or through some higher level abstraction—like pipelinelib’s `.pipeline/config.yaml`.
* Argo is a system. All the underlying process scheduling is done by Kubernetes which makes for fewer moving parts in Argo itself.
* Being k-native also means inheriting the k8s standard toolchain (`kubectl`) which could be a benefit to both users and admins that are already familiar with or are learning k8s. Having system interfaces that are more broadly understood outside of the team seems like a good thing.
* Being k-native it is designed to be deployed via Kubernetes, a platform we’re already investing in and have been directly targeting in Deployment Pipeline work.
* Integration with Gerrit is possible in a few different ways, via webhooks, Kafka, or an Argo Events Gateway plugin. There would be pros/cons to each but having options is generally a good thing, and any of these options seem sufficient.
* Commenting back to Gerrit is already possible using Argo Events and moreover the latter is a flexible system, allowing us to easily set up system event handlers for reporting, notifications, etc.
* Artifact support is pluggable and transparent—Argo UI provides a proxied link for downloading artifact files. For this PoC I used Minio but other systems are supported.
* Argo seems well supported and maintained. I had a positive experience with upstream in both their Slack channel for support and via GitHub for contribution—of 4 and 1 patches to argo-events and argo (workflow) respectively.
* The extent of Argo’s user base is nowhere near that of GitLab but seems decent—[[https://github.com/argoproj/argo#who-uses-argo|34 companies]], many of them high profile.
* The documentation is pretty good.
=== Cons ===
* These k-native things are very new.
* Although my experience interacting with upstream was very positive, the project lacks a diversity of core maintainers—its main two contributors are employees of Intuit.
* Similarly, argo-events has just one or two core contributors—different than argo itself.
* Git support in both argo and argo-events was lacking the ability to fetch and checkout specific refs which originally made checking out Gerrit patchsets impossible. There were also some weird inefficiencies around git clones identified in code. I wrote a handful of PRs which implemented the missing features and optimized clones. I’m not sure this is a con exactly, as contribution to upstream is an important part of an open-source software user relationship. However, it does beg the question: what else might we have to implement ourselves in order to achieve a fully useable CI system?
* Argo Events is an esoteric system. Its gateway/sensor/circuit concepts/resources took a lot of reading about before I could get started in setting up the Gerrit integration.
* Argo, like GitLab, is lacking Zuul’s dependent pipeline logic and merger. We’ll have to implement some sort of controller for these these features if we go with Argo.
* Documentation could be better.