Evaluate Zuul
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	brennen
	Mar 12 2019, 6:00 PM

Description

How hard would it be to implement a simple CI task that gets Blubber source code from Gerrit, builds it with Go, and runs its unit tests?

(Repeating the standard task we're using for quick vetting of CI systems here, but probably this one actually looks a bit different in practice.)

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• LarsWirzenius	T217325 Consider and evaluate possible new CI tooling
		Resolved		brennen	T218138 Evaluate Zuul

Event Timeline

brennen triaged this task as Medium priority.Mar 12 2019, 6:00 PM

brennen created this task.

Related to T186426

brennen renamed this task from Investigate Zuul v3 to Evaluate Zuul v3.Mar 12 2019, 6:09 PM

brennen added a project: Zuul.Mar 12 2019, 6:12 PM

hashar subscribed.Mar 15 2019, 6:57 AM

zeljkofilipin mentioned this in T217325: Consider and evaluate possible new CI tooling.Mar 18 2019, 1:22 PM

I started with the official Quick-Start Installation and Tutorial, which is built around docker-compose.

I spun up a Debian Stretch VM on DigitalOcean in case anyone wants to reuse this environment for further evaluation, and used SSH forwarding to access the various services as if they're on localhost. I used [[ https://gerrit.wikimedia.org/r/plugins/gitiles/releng/local-charts/+/master/debian_prereq.sh | debian_prereq.sh from local-charts ]] to quickly install Docker and friends.

The quick-start docker-compose.yaml spins up containers which run:

Gerrit
MariaDB
zuul-web
zuul-executor
zookeeper
nodepool launcher
an Apache log server
a single Ubuntu 18.04 node with Python and rsync (rastasheep/ubuntu-sshd) for a Nodepool static pool

The tutorial then walks you through creating YAML configuration files for pipelines ("check" and "gate" in this case), a base job, and the application of jobs to projects. Projects themselves are listed under "tenants" in an /etc/zuul/main.yaml, and won't be picked up otherwise. (For the quick-start example, I had to edit [[ http://git.zuul-ci.org/cgit/zuul/tree/doc/source/admin/examples/etc_zuul/main.yaml | doc/source/admin/examples/etc_zuul/main.yaml ]] to add a project name under gerrit.untrusted-projects and restart everything order to add a new project to the list.) Nodepool is configured by an /etc/nodepool/nodepool.yaml.

From the quick-start:

Zuul doesn’t take anything for granted, and even tasks such as copying the git repos for the project being tested onto the remote node must be explicitly added to a base job (and can therefore be customized as needed). The Zuul in this tutorial is pre-configured to use the zuul jobs repository which is the “standard library” of Zuul jobs and roles. We will make use of it to quickly create a base job which performs the necessary set up actions and stores build logs.

Once all of this is done, you can define any of zuul.d, .zuul.d, zuul.yaml, or .zuul.yaml in the root of individual project repositories, along with Ansible playbooks which will be run by the jobs. Jobs seem to be global across the set of repositories that Zuul knows about, so their names must be unique. A given project repo can run jobs which are defined in other repos, and stuff like building Docker images is provided out of the box.

I'm still fairly hazy on all of this, but this is a good point to read the Concepts section of the manual:

"The executable contents of jobs themselves are Ansible playbooks. Ansible’s support for orchestrating tasks on remote nodes is particularly suited to Zuul’s support for multi-node testing. Ansible is also easy to use for simple tasks (such as executing a shell script) or sophisticated deployment scenarios. When Zuul runs Ansible, it attempts to do so in a manner most similar to the way that Ansible might be used to orchestrate remote systems. Ansible itself is run on the executor and acts remotely upon the test nodes supplied to a job. This facilitates continuous delivery by making it possible to use the same Ansible playbooks in testing and production."

I failed at finishing the evaluation task, but got fairly close. This is the .zuul.yaml I defined for Blubber:

- job:
    name: blubber-build
    run: playbooks/blubber-build.yaml

- job:
    name: blubber-test
    run: playbooks/blubber-test.yaml

- project:
    check:
      jobs:
        - blubber-build
        - blubber-test
    gate:
      jobs:
        - blubber-build
        - blubber-test

And this is playbooks/blubber-build.yaml:

# ansible!
- hosts: all
  tasks:
    - debug:
        msg: Building Blubber.
    - name: Install build and test dependencies
      apt:
        name: "{{ packages }}"
        update_cache: yes
      vars:
        packages:
        - golang
        - golint
        - build-essential
        - tree
        - git
    - name: Run tree and output
      command: tree src
      register: out
    - debug: var=out.stdout_lines
    - make:
        chdir: src/gerrit/blubber

It uses Ansible's apt and make modules. The build itself doesn't actually work because Ubuntu 18.04 packages Go 1.10 by default and the same glitch that Lars encountered when building Blubber on GitLab CI applies here (see T217594). Still, in principle, it should work if a newer Go is used and/or the Ubuntu image is swapped out for a golang-specific image.

The invocation of tree is there because I was having a hard time figuring out the layout of the src directory. Getting to this point was pretty frustrating and involved a lot of trial and error and scanning logs for error output.

General impressions:

Self-serve: Partial at best? The configuration that has to be done centrally on a per-repo basis _might_ be kept fairly simple, but at minimum it seems like someone with authority has to modify Zuul configuration to pay attention to a new project, and I think work may also have to be done to specify what type of node jobs run on.
Complexity and expressiveness: It feels like there's a _lot_ of both here. There are a bunch of moving pieces, there's a bunch of configuration in different places, and there are a lot of degrees of freedom in how things can be set up.
Nodepool: This is a hard requirement for running Zuul v3. Per discussion elsewhere, it seems like the Kubernetes driver would be our option. Possibly we could define multiple pools where some use the static driver instead.
Documentation: Generally seems pretty good, but there are gaps. In particular, it was easy enough to get to a demo Gerrit + Zuul install, but it wasn't very clear where to go from there in terms of actually running builds and tests.
Web UI: Could be worse. Seems to be very read-only. Access to build logs feels clunky, but I'm sure could be improved in a real production setup.
Execution model: Jobs are run by Ansible SSHing in to nodes. The base job (also built on Ansible playbooks) sets up some scaffolding for this.

This brings me to Ansible. I'll lay my prejudices on the table and say that I haven't personally had a very good experience of Ansible, and often find myself at odds with its design and expectations. That said, it's mature and capable software at this point, and it comes with a large body of standard modules for various tasks.

It seems to me that, assuming we could make Nodepool work, Zuul v3 is capable of meeting our needs in a _technical_ sense. To some extent it provides a highly configurable toolkit for defining a pipeline, and in combination with the flexibility afforded by Ansible I think it could be made to fit a wide range of scenarios. The other side of this is that it feels complex, configuration-intensive, and likely to become a locus of technical debt in the form of widely scattered .yaml files and Ansible playbooks. It also just feels less developer-friendly and self-serve than the model offered by tooling like sourcehut builds or GitLab CI.

If anyone else would like to experiment with my demo installation, ping here or on IRC and I'll add your pubkey and write up quick SSH instructions.

I've used Ansible for several years now. In some ways I like it a lot, especially the "push" model, which doesn't require an agent running constantly on the target. However, I don't like that Ansible developers keep deprecating features and making backwards incompatible changes. It doesn't feel to me like a mature system yet. If we start using it, I predict it will cause us to deal with a small, but constant churn in our CI jobs, and that this _will_ result in technical debt.

This isn't bad enough for us to reject Zuul v3, but it's a mark against it.

@brennen is this resolved?

zeljkofilipin renamed this task from Evaluate Zuul v3 to Evaluate Zuul.Mar 22 2019, 1:59 PM

Yep!

fgiunchedi subscribed.Mar 26 2019, 9:43 AM

thcipriani mentioned this in T238261: 2019 Tech Conf Unconference: New CI/Argo.Nov 15 2019, 12:58 PM

Evaluate ZuulClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Evaluate Zuul
Closed, ResolvedPublic
Actions

Related Objects
Search...