Page MenuHomePhabricator

Figure out CI solution for building buildpack-based images for Toolforge
Closed, ResolvedPublic

Description

As part of the push to deploy project, once a git server has triggered a webhook, we need something to invoke the CNB lifecycle to build the image and then push it to the registry. A separate problem is to schedule a new deployment in k8s (equivalent to what webservice would normally do).

Projects under consideration:

Tekton has a much more community-focused governance model so far. Red Hat and VMware are primary contributors, but kpack is very much a VMWare project, it seems and is a bit less flexible about who-creates-what, which makes the security model trickier for controlled pipelines.

Inputs the CI system will need:

  • Git repo URL and commit (or branch or tag)
  • Tool name and image name and deployment name
  • Any buildpack-specific information (which could be in the source repo)

Actions the CI system will do:

Users should be able to:

Most of this last set is provided by a read-only tekton pipelines dashboard in testing.

Event Timeline

Regarding GitLab, the timeline for the production instance is "June, about 8 months from now". I think our goal for buildpacks is sooner than that, so even if we decide using prod GitLab is the way to go, we'll still need an interim solution (potentially one set up in a Cloud VPS?).

Some intial considerations:

  • We would need to make sure that there are appropriate holes in the firewall so GitLab can talk to Toolforge Kubernetes (I'm assuming the CI runners will be able to, but I don't know if that's enough)
  • GitLab has its own k8s compatibility timetable, so we'd have to ensure that our k8s upgrade schedule is in sync with the GitLab upgrade schedule
  • If we use GitLab then it's one less thing for Cloud Services/Toolforge to independently maintain/administer. It's also one less thing for users to learn.
  • Would we allow people to directly use other git hosting services or would we only support our GitLab? It does support automated mirroring of repositories, but that comes with a time delay.

The k8s compatibility timetable you mention lists the very oldest version of k8s that upstream Kubernetes supports is the very newest version that Gitlab supports. That could be a problem for us from a security perspective. At this time, our aim is to upgrade to 1.17 because we are plodding along keeping up with the oldest version, but we may want to get slightly ahead in time, so I'm saying this while we are in a bad state, but we aim to at least track the oldest version that gets backported patches.

I am skeptical that we will be able to directly include the Gitlab Kubernetes tooling from prod in this directly, but it could happen. That would also require a dramatic shift in the security model we have in our k8s system to give Gitlab a relatively free hand inside Toolforge. That's something to think about and contain well (should be doable).

All this does seem like one hosted in WMCS for this purpose alone might be a likely bet if we use Gitlab, but it also might be a bit funny...Gitlab talking to Gitlab...(not entirely unlike our puppet setup).

Just some thoughts.

Based on what it took to get a buildpack tool deployed in toolsbeta, I added what inputs the CD system should get, what exactly it would do, and what I as a user would expect. Please add to it or let me know if I'm missing/overlooking something.

The k8s compatibility timetable you mention lists the very oldest version of k8s that upstream Kubernetes supports is the very newest version that Gitlab supports. That could be a problem for us from a security perspective. At this time, our aim is to upgrade to 1.17 because we are plodding along keeping up with the oldest version, but we may want to get slightly ahead in time, so I'm saying this while we are in a bad state, but we aim to at least track the oldest version that gets backported patches.

Also historically we have not done a great job staying up to date with Gerrit upstream releases, hopefully GitLab will be a different story though. I think we should come up with a set of questions/asks to get some more input from RelEng/the GitLab migration team on what exactly we should expect.

I am skeptical that we will be able to directly include the Gitlab Kubernetes tooling from prod in this directly, but it could happen. That would also require a dramatic shift in the security model we have in our k8s system to give Gitlab a relatively free hand inside Toolforge. That's something to think about and contain well (should be doable).

Based on my skim of the group-level docs it seems like we could limit access to just a "Toolforge" group. Do the concerns w/r to the security model apply just to GitLab or would these be for any CD system?

All this does seem like one hosted in WMCS for this purpose alone might be a likely bet if we use Gitlab, but it also might be a bit funny...Gitlab talking to Gitlab...(not entirely unlike our puppet setup).

I'm not sure whether the "learn only one system" advantage of having two GitLabs would outweigh the "wait which GitLab" confusion :P

Andrew triaged this task as Medium priority.Jan 12 2021, 5:10 PM
Andrew moved this task from Soon! to Inbox on the cloud-services-team (Kanban) board.
Bstorm removed a subscriber: Legoktm.

Fixed up to match what we are doing right now. I've beat gitlab to death and can confirm it's frustrating to make do what we want with the security we want. That said, it can use CNB out of the box with a privileged container (it uses the pack command). However, you have to give it full access and control over a k8s cluster and...no. We aren't giving a web interface that is controlled by tool users full clusteradmin control of the k8s cluster without a lot more safeguards.

That's why gitlab offers nothing more than any other project for us directly. Dedicated clusters for CI are another matter with a customized plugin for Toolforge. Tekton pipelines is very easy to operate in a cluster, and the right trigger could work. I'm building out how to make the security model behave right so that it only does what we want and does not grant unfettered access to a docker repo.

Bstorm renamed this task from Figure out CD solution for building buildpack-based images for Toolforge to Figure out CI solution for building buildpack-based images for Toolforge.May 11 2021, 11:00 PM
Bstorm claimed this task.
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

I've set up an example of Tekton-pipelines in toolsbeta at this point. It consumes one namespace on its own, and I've given it the power only to act on one namespace, named image-build. That namespace is set up so that there is a service account that can read the secret needed to commit an image artifact to harbor. That service account is specified in the build packs Task, which can only be defined by Toolforge admins. It is currently based on version v0.3 of the upstream CNCF official task maintained in the Tekton Pipelines catalog with the privileged flag set to false.

That plus the git-clone task and the nfs subdir storage class to provision the workspace needed to check out the code repo and then operate on it with the build packs task allows a tool account to create a pipeline and pipelineruns to make this all happen.

You can specify a buildpack, a code dir and then commit it to a quota-enabled image repository with a kubectl command. In order to use this from Github, Gerrit, Phabricator and Gitlab (the list grows), it also will require "Tekton triggers" and event listeners with ingresses. I do think that should be on a separate ingress class from the primary one for traffic control reasons. Ideally, those trigger and listener objects will be in tool namespaces, but that needs built.

I think for now, that closes this as a research task. There are many tasks yet to finish this beyond the basic example in toolsbeta.

For one, we need to validate the pipeline parameters (this one used paketo buildpacks and a maven sample app). There are also some more design items to complete since this build system doesn't need to mirror the permissions model we have been using, but it needs to be fully compatible with it.

So the first resulting artifact from this experimentation is https://harbor.toolsbeta.wmflabs.org/harbor/projects/3/repositories/maven-foo (which I haven't tested behind the front proxy yet, but it works on VMs on port 8080 since this is a simple sample app that I didn't write).

Since we could use Tekton to deploy these things as well, and it is fairly simple to deploy their dashboard as a read-only beast, I'll close this ticket and focus there.