Page MenuHomePhabricator

Toolforge: improve local kubernetes development setup
Open, In Progress, MediumPublic

Description

When developing software for Toolforge, we've detected a few problems related to how we usually set up the local kubernetes environment. Example problem T325755: Bug: jobs-framework-api job run fails silently on local development environment. The goal of this ticket is to capture/describe the problem and potential solutions.

There are scattered instructions on how to setup the local development environment in several different READMEs (example1, example2), and a collection of opinionated and questionable setup.sh scripts (example). The development environment for a local kubernetes can also be different if using minikube, kind or if using Debian, Ubuntu or Mac. To overcome this situation I started experimenting with an ansible setup playbook here: https://gitlab.wikimedia.org/aborrero/cloud-toolforge-lima-kilo

The lima-kilo setup could include pieces like:

  • setting up the core RBAC config of Toolforge kubernetes (PSP, Roles, etc)
  • maintaining the [fake] context for Toolforge, like /data/project directories, fake files like /etc/wmcs-project that the different K8S components expect to exist
  • setting up the different admission controllers so the local kubernetes behaves like the actual Toolforge kubernetes
  • creating a few fake tool accounts (users) so they can be used for development/testing purposes

We could have playbooks to support the combo {debian,ubuntu,mac}-{kind,minikube}-{install,uninstall}.
I've started with the debian-kind combo because is the one @Raymond_Ndibe and I use. The experiments so far have show promising results.

NOTE: I guess if this lima-kilo project reaches critical mass, we could run a decision request to formalize adoption, meaning updating all Toolforge related repos to point to it.

Event Timeline

Change 879610 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: adopt the lima-kilo setup

https://gerrit.wikimedia.org/r/879610

With today's changes in https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo I was able to run the test-suite of the jobs-framework-api without errors. Meaning that we finally got the accounts, RBAC, permissions, directories etc in enough good shape to be able to run the full lifecycle of a job in a local kubernetes.

Change 879610 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: adopt the lima-kilo setup

https://gerrit.wikimedia.org/r/879610

aborrero triaged this task as Medium priority.Jan 17 2023, 1:27 PM

The current status of the lima-kilo project is the following:

Change 881388 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-cli@master] jobs-framework-cli: adapt to the lima-kilo project

https://gerrit.wikimedia.org/r/881388

Change 881388 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-cli@master] jobs-framework-cli: adapt to the lima-kilo project

https://gerrit.wikimedia.org/r/881388

aborrero updated https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/10

k8s: fake_maintain_kubeusers: assert if the kubernetes_ca is empty