As a proof of concept, we'll try setting up a buildpack CI pipeline in toolsbeta. The goal is to have a git push action trigger building a new docker image using buildpacks and then deploying it to k8s.
|Open||None||T194332 [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs|
|Resolved||Bstorm||T265684 Figure out CI solution for building buildpack-based images for Toolforge|
|Open||Bstorm||T267374 Set up a Toolforge buildpack CI pipeline as a POC|
|Open||Bstorm||T267616 Set up docker-registry and image builder infra in toolsbeta|
|Resolved||Andrew||T267618 Request increased quota for toolsbeta Cloud VPS project|
One of the main problems with using pack right now is that it requires access to a docker socket to be able to build images. The current alternative is to use https://github.com/pivotal/kpack which I think is even more complex. So my current plan is to pass the docker socket into the CI containers (security note that this gives the containers root on the host effectively).
And I think we should keep an eye on https://github.com/buildpacks/pack/issues/564 for when we want to leave POC status.
Sounds good for now. On the other hand, it isn't crazy to try things like https://github.com/rootless-containers/rootlesskit to see if we can make a docker socket to point at. We only need to be able to build and the equivalent of push, right (I say naively)? I wonder how hard to it would be to test using something like that after you get it working with a "real" socket?
kpack is interesting. It seems to be trying to build out buildpacks as an actual service, and that may actually be where we want to go in the end. I see they have all the faith in it as a 0.1 release, but maybe I can take an action item to look at it more seriously.
For the past week-ish I've been investigating and playing with argocd, and I don't think it's the right solution for us. It really just does the *D* in CD, there seems to be no functionality for building, only deploying. Looking at https://argoproj.github.io/argo-cd/user-guide/ci_automation/, their example flow is having some CI workflow build/publish the image (what we want to do with pack) then commit to your config repo, which argocd watches and redeploys.
So we need some kind of CI system to handle the build process, I'm going to slightly pivot to looking at argo workflows, which seems rather Jenkins like.
I do think argocd's web interface for observing the status of your deployment and like restarting pods is pretty nice. We might want to use argocd for that - in which case we'd want to have CI build/publish the image, then automatically commit a config change to a Git repo that just holds config for all tools, triggering argocd. Also some of the argocd functionality like rollbacks looks like it requires proper image tagging, which would force us to stop using :latest.
Just a footnote that using :latest for workloads you care about is the most frequently called-out anti-pattern I know of for docker in any production use. It has value, but it blocks rollbacks, automation and troubleshooting (as a service). It's fine to keep building with it now (as we always have), but something can, will and should stop us from doing it one day. Maybe one day, it'll be some tool like this. Thanks for finding the problems and making the pivot!
tl;dr: argo works fine, though it has no killer features over jenkins/gitlab/<generic CI system>. I expect operational concerns like deployment, security, etc. will be the main things we consider when making the decision.
I set up argo locally using their mysql quickstart. I didn't actually look at what it was storing, but I we'll need some MySQL database backing this service up rather than putting the db in a container?
Once a "workflow" has been deployed, it can be triggered via webhooks. We'd want to create a unique token for each tool, so we know which image we're building when an event is sent.
The workflow I created was: P13509. In the real version we'd want to use a sidecar to provide docker-in-docker (see example), and make a new image with a user that is in the docker group rather than reusing the build image which forces the tfb user.
argo can also directly manipulate k8s resources (example) but my understanding is that this only works in the argo namespace, and we want tools to run in their own namespaces. Also I don't think there's any meaningful difference between this and shelling out to kubectl.
So feature wise, it seems basically equal to what we can do with Jenkins or GitLab CI. The main concerns really then are how we want to deploy this and what we want to support. I think argo is appealing because it's intended to be deployed via k8s, allowing us to reuse that infra. And we would control it, allowing us to manage upgrades, etc.
Really nice work!
I think argo is appealing because it's intended to be deployed via k8s, allowing us to reuse that infra. And we would control it, allowing us to manage upgrades, etc.
I'm curious, is there anything preventing us from running any of the other solutions on our k8s? (Jenkins/gitlab/...)
There is an issue with running any of this on one of our k8s clusters in final deployment: no LDAP auth and questions about network security. This can be resolved, but it needs validation and changes. Now that helm3 is standard (and doesn't break security models), we can also helm up whatever we use nicely.
Otherwise, I evaluated gitlab previously, and it was a fun wander down ruby-on-rails and pretty support heavy. I ran into a couple issues, but it could be worked through. It did seem very heavy for our purposes (as does Jenkins) at the time, but with the Foundation moving to Gitlab theoretically, maybe there will be a lot of knowledge tossed around about it anyway making it a clear choice, if not confusing for some users to have two of them.
From an operational perspective, argo *may* be much simpler for us than the java and ruby ones. Argo is receiving a lot of patches, attention and work in the k8s realm, is golang and is pretty good for k8s work. A few months back it was considered as a standard to move to for the Foundation (replaced with Gitlab, Blubberoid/Pipeline the in-house thing and possibly also Jenkins?). It's basically going to be an operational decision for us. What seems usable to us, what deploys well in our setup, gets the job done and won't constantly make us wish we had the Enterprise Edition (*eyes Gitlab scornfully*).
In addition to what @Bstorm said, for GitLab, my current thinking is that it would make more sense to use the production install rather than running our own, but that's still a moving target.
Agreed. As a starting point, when building new images, we should tag them with some timestamp/sha1 identifier as the version, plus :latest. Our initial deployment system can still use latest, but it should make it a little bit easier to switch later on.
On this topic, it is very interesting that the buildpacks project seems to expressly suggest that nobody should use the pack command in CI https://github.com/buildpacks/pack/issues/564 (and I really agree from reading the code and issues today). We had naively imagined the tool had a use in that domain, but they actually expect you to implement the buildpack standard in your CI directly, made easier if you have a container-based CI, but podman can run in a lot of places as can kpack. Argo may actually give us a bit in this regard because it is container-native (can even run docker-in-docker if we wanted to go there...but I suspect we can do without it and there are other container CI systems in use right now for this). pack is very focused on the work that has been done so far of building the general notion of a buildpack locally, it seems. This seems to just mean to me that the next phase of this project will be all about building the pipelines in CI, the CD deployments and their interface with repos. The work you did with pack will be invaluable as a model for that.
I see the production ticket has the same general conclusion :) T266081
Kubernetes upstream is moving away from using docker at all (to just use containerd and cri-o type stuff), so the docker socket is likely to not be available at all eventually.
Shuffling this task around to repurpose it for the work I'm currently doing on this. As is, the build of a buildpack is what matters, and that's technically a CI function. This is not about replacing anyone else' CI, but rather creating a specific pipeline with tight controls on it for Toolforge buildpacks.
Some working parts are now in the Toolsbeta k8s cluster using tekton pipelines, which is extremely simple since it is just a CRD and controller set inside the cluster.