Page MenuHomePhabricator

Figure out CD solution for building buildpack-based images for Toolforge
Open, MediumPublic

Description

As part of the push to deploy project, once a git server has triggered a webhook, we need something to invoke pack, build the image and then push it to the registry. Then it would schedule a new deployment in k8s (equivalent to what webservice would normally do).

Projects under consideration:

Inputs the CD system will get:

  • Git repo URL and commit
  • Tool name and image name and deployment name

Actions the CD system will do:

  • Clone the git repo and checkout the correct commit
  • Some sanity check that a service.template exists, is proper YAML...
  • Get the stack name from the config
  • Run pack build {image_name} --builder docker-registry.tools.wmflabs.org/toolforge-{stack}-builder:latest --publish
    • This step will need access to the docker socket, effectively running pack as root.
    • This step also needs push access to the docker repo
  • Verify no webservice is currently running (T266901: Ensure webservice plays nicely with Toolforge tools using buildpack images)
  • Create a new k8s deployment if it doesn't exist yet OR delete the existing pods so the new image is pulled when it restarts
    • This step needs k8s access
    • Sidenote: we'll need some independent mechanism to stop a buildpack web server, maybe we reuse webservice stop?

We'll probably want to have some kind of garbage collection on the CD hosts to delete old images and volumes every so often (but not immediately to take advantage of caching). Given that pack conveniently timestamps everything to 40 years ago to keep images reproducible, I don't know if there's an easy way to figure out how old an image is...we might just want to initially delete everything.

Users should be able to:

Event Timeline

Legoktm created this task.Oct 15 2020, 9:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 15 2020, 9:14 PM
Legoktm updated the task description. (Show Details)Oct 28 2020, 6:24 PM

Regarding GitLab, the timeline for the production instance is "June, about 8 months from now". I think our goal for buildpacks is sooner than that, so even if we decide using prod GitLab is the way to go, we'll still need an interim solution (potentially one set up in a Cloud VPS?).

Some intial considerations:

  • We would need to make sure that there are appropriate holes in the firewall so GitLab can talk to Toolforge Kubernetes (I'm assuming the CI runners will be able to, but I don't know if that's enough)
  • GitLab has its own k8s compatibility timetable, so we'd have to ensure that our k8s upgrade schedule is in sync with the GitLab upgrade schedule
  • If we use GitLab then it's one less thing for Cloud Services/Toolforge to independently maintain/administer. It's also one less thing for users to learn.
  • Would we allow people to directly use other git hosting services or would we only support our GitLab? It does support automated mirroring of repositories, but that comes with a time delay.

The k8s compatibility timetable you mention lists the very oldest version of k8s that upstream Kubernetes supports is the very newest version that Gitlab supports. That could be a problem for us from a security perspective. At this time, our aim is to upgrade to 1.17 because we are plodding along keeping up with the oldest version, but we may want to get slightly ahead in time, so I'm saying this while we are in a bad state, but we aim to at least track the oldest version that gets backported patches.

I am skeptical that we will be able to directly include the Gitlab Kubernetes tooling from prod in this directly, but it could happen. That would also require a dramatic shift in the security model we have in our k8s system to give Gitlab a relatively free hand inside Toolforge. That's something to think about and contain well (should be doable).

All this does seem like one hosted in WMCS for this purpose alone might be a likely bet if we use Gitlab, but it also might be a bit funny...Gitlab talking to Gitlab...(not entirely unlike our puppet setup).

Just some thoughts.

Legoktm updated the task description. (Show Details)Oct 30 2020, 6:18 PM

Based on what it took to get a buildpack tool deployed in toolsbeta, I added what inputs the CD system should get, what exactly it would do, and what I as a user would expect. Please add to it or let me know if I'm missing/overlooking something.

The k8s compatibility timetable you mention lists the very oldest version of k8s that upstream Kubernetes supports is the very newest version that Gitlab supports. That could be a problem for us from a security perspective. At this time, our aim is to upgrade to 1.17 because we are plodding along keeping up with the oldest version, but we may want to get slightly ahead in time, so I'm saying this while we are in a bad state, but we aim to at least track the oldest version that gets backported patches.

Also historically we have not done a great job staying up to date with Gerrit upstream releases, hopefully GitLab will be a different story though. I think we should come up with a set of questions/asks to get some more input from RelEng/the GitLab migration team on what exactly we should expect.

I am skeptical that we will be able to directly include the Gitlab Kubernetes tooling from prod in this directly, but it could happen. That would also require a dramatic shift in the security model we have in our k8s system to give Gitlab a relatively free hand inside Toolforge. That's something to think about and contain well (should be doable).

Based on my skim of the group-level docs it seems like we could limit access to just a "Toolforge" group. Do the concerns w/r to the security model apply just to GitLab or would these be for any CD system?

All this does seem like one hosted in WMCS for this purpose alone might be a likely bet if we use Gitlab, but it also might be a bit funny...Gitlab talking to Gitlab...(not entirely unlike our puppet setup).

I'm not sure whether the "learn only one system" advantage of having two GitLabs would outweigh the "wait which GitLab" confusion :P

Legoktm updated the task description. (Show Details)Oct 30 2020, 6:48 PM
Legoktm updated the task description. (Show Details)Nov 3 2020, 4:31 PM
Andrew triaged this task as Medium priority.Jan 12 2021, 5:10 PM
Andrew moved this task from Soon! to Inbox on the cloud-services-team (Kanban) board.