Page MenuHomePhabricator

Figure out and document how to call the Kubernetes API as your tool user from inside a pod
Open, HighPublic

Description

Anomiebot's status page is a good use case for being able to find out what pods are running in a tool's namespace from inside of a pod in the namespace.

The info at https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/ leads to the same problem documented at https://stackoverflow.com/questions/48311683/how-can-i-use-curl-to-access-the-kubernetes-api-from-within-a-pod. The issue is that the default serviceaccount credentials mounted into the pod do not have RBAC access to the API.

We have an ability to setup a special service account for any given tool which allows read-only access to all tenant namespaces. This is used by the k8s-status tool and documented at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Maintenance#wmcs-k8s-enable-cluster-monitor. One downside of this method is that it requires using a custom Deployment rather than just webservice start to attach the credentials to the pod.

https://kubernetes.io/docs/reference/access-authn-authz/rbac/#service-account-permissions explains various ways that the default service account for a tool could be changed so that it can access the api.

Also, because we mount $HOME into the pod, it should be possible to use the tool's x509 certificate credentials from $HOME/.toolskube to get an auth token.

  • Document how to use the credentials from $HOME/.toolskube
  • Document how an admin could grant read-only API access to the default service account for a tool
  • Document how to request that your tool's default service account be granted read-only API access

Related Objects

StatusSubtypeAssignedTask
OpenNone
Stalleddcaro
StalledNone
OpenNone
OpenNone
ResolvedSlst2020
ResolvedSlst2020
Resolved aborrero
ResolvedSlst2020
DeclinedNone
ResolvedBUG REPORTdcaro
ResolvedBUG REPORTdcaro
ResolvedRaymond_Ndibe

Event Timeline

Document how an admin could grant read-only API access to the default service account for a tool
Document how to request that your tool's default service account be granted read-only API access

Is any of that actually needed? I have read+write access to my tool's namespace from within a pod, though no one has granted any special access AFAIK.

I have read+write access to my tool's namespace from within a pod, though no one has granted any special access AFAIK.

Are you doing that by authenticating using your tool's credentials from $HOME/.toolskube?

The steps about default service accounts would be designed to make it easier to get read-only access by using the service account's token that is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token in each Container.

Accessing the Kubernetes API from inside of a container using the default service account credentials:

$ APISERVER=https://kubernetes.default.svc
$ SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
$ CA_CERT=${SERVICEACCOUNT}/ca.crt
$ TOKEN=$(cat ${SERVICEACCOUNT}/token)
$ NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
$ curl --cacert $CA_CERT -H "Authorization: Bearer $TOKEN" "${APISERVER}"
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:serviceaccount:tool-bd808-test:default\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

Because the tool's $HOME is mounted inside of our container, we can authenticate using the x509 certificate from $HOME/.toolskube.

$ APISERVER=https://kubernetes.default.svc
$ SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
$ CA_CERT=${SERVICEACCOUNT}/ca.crt
$ NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
$ CERT=$HOME/.toolskube/client.crt
$ KEY=$HOME/.toolskube/client.key
$ curl --silent --cacert $CA_CERT --key $KEY --cert $CERT "${APISERVER}/api/v1/namespaces/$NAMESPACE/pods/" |
  jq -r ".items[] | [.metadata.name, .status.phase] | @tsv"
bd808-test-77b666f66f-z87pw     Running
shell-1668200251        Running

A Toolforge admin can grant "view" rights to the default service account for a given tool:

$ kubectl sudo create rolebinding default-view \
      --clusterrole=view \
      --serviceaccount=tool-bd808-test:default \
      --namespace=tool-bd808-test
rolebinding.rbac.authorization.k8s.io/default-view created

With this rolebinding in place, the default service account can query for running pods:

$ APISERVER=https://kubernetes.default.svc
$ SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
$ CA_CERT=${SERVICEACCOUNT}/ca.crt
$ TOKEN=$(cat ${SERVICEACCOUNT}/token)
$ NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
$ curl --silent --cacert $CA_CERT -H "Authorization: Bearer $TOKEN" "${APISERVER}/api/v1/namespaces/$NAMESPACE/pods/" |
jq -r ".items[] | [.metadata.name, .status.phase] | @tsv"
bd808-test-77b666f66f-z87pw     Running
shell-1668205058        Running
taavi@tools-sgebastion-11:~ $ k sudo desc clusterrole view
Name:         view
Labels:       kubernetes.io/bootstrapping=rbac-defaults
              rbac.authorization.k8s.io/aggregate-to-edit=true
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources                                    Non-Resource URLs  Resource Names  Verbs
  ---------                                    -----------------  --------------  -----
  bindings                                     []                 []              [get list watch]
  configmaps                                   []                 []              [get list watch]
  endpoints                                    []                 []              [get list watch]
  events                                       []                 []              [get list watch]
  limitranges                                  []                 []              [get list watch]
  namespaces/status                            []                 []              [get list watch]
  namespaces                                   []                 []              [get list watch]
  persistentvolumeclaims/status                []                 []              [get list watch]
  persistentvolumeclaims                       []                 []              [get list watch]
  pods/log                                     []                 []              [get list watch]
  pods/status                                  []                 []              [get list watch]
  pods                                         []                 []              [get list watch]
  replicationcontrollers/scale                 []                 []              [get list watch]
  replicationcontrollers/status                []                 []              [get list watch]
  replicationcontrollers                       []                 []              [get list watch]
  resourcequotas/status                        []                 []              [get list watch]
  resourcequotas                               []                 []              [get list watch]
  serviceaccounts                              []                 []              [get list watch]
  services/status                              []                 []              [get list watch]
  services                                     []                 []              [get list watch]
  controllerrevisions.apps                     []                 []              [get list watch]
  daemonsets.apps/status                       []                 []              [get list watch]
  daemonsets.apps                              []                 []              [get list watch]
  deployments.apps/scale                       []                 []              [get list watch]
  deployments.apps/status                      []                 []              [get list watch]
  deployments.apps                             []                 []              [get list watch]
  replicasets.apps/scale                       []                 []              [get list watch]
  replicasets.apps/status                      []                 []              [get list watch]
  replicasets.apps                             []                 []              [get list watch]
  statefulsets.apps/scale                      []                 []              [get list watch]
  statefulsets.apps/status                     []                 []              [get list watch]
  statefulsets.apps                            []                 []              [get list watch]
  horizontalpodautoscalers.autoscaling/status  []                 []              [get list watch]
  horizontalpodautoscalers.autoscaling         []                 []              [get list watch]
  cronjobs.batch/status                        []                 []              [get list watch]
  cronjobs.batch                               []                 []              [get list watch]
  jobs.batch/status                            []                 []              [get list watch]
  jobs.batch                                   []                 []              [get list watch]
  daemonsets.extensions/status                 []                 []              [get list watch]
  daemonsets.extensions                        []                 []              [get list watch]
  deployments.extensions/scale                 []                 []              [get list watch]
  deployments.extensions/status                []                 []              [get list watch]
  deployments.extensions                       []                 []              [get list watch]
  ingresses.extensions/status                  []                 []              [get list watch]
  ingresses.extensions                         []                 []              [get list watch]
  networkpolicies.extensions                   []                 []              [get list watch]
  replicasets.extensions/scale                 []                 []              [get list watch]
  replicasets.extensions/status                []                 []              [get list watch]
  replicasets.extensions                       []                 []              [get list watch]
  replicationcontrollers.extensions/scale      []                 []              [get list watch]
  nodes.metrics.k8s.io                         []                 []              [get list watch]
  pods.metrics.k8s.io                          []                 []              [get list watch]
  ingresses.networking.k8s.io/status           []                 []              [get list watch]
  ingresses.networking.k8s.io                  []                 []              [get list watch]
  networkpolicies.networking.k8s.io            []                 []              [get list watch]
  poddisruptionbudgets.policy/status           []                 []              [get list watch]
  poddisruptionbudgets.policy                  []                 []              [get list watch]

I don't see anything particularly sensitive on that list, except maybe pods/log. Most notably it doesn't grant access to secrets.

I think it should be OK to grant tool accounts this access to their own namespaces, but I'd like to have it managed with via something that's kept in version control instead of adding ad hoc objects to the Kubernetes cluster.

I think it should be OK to grant tool accounts this access to their own namespaces, but I'd like to have it managed with via something that's kept in version control instead of adding ad hoc objects to the Kubernetes cluster.

This sounds like a reasonable idea, but I'm not aware of any tool account namespace objects that are current handled this way. Today there are durable things in each tool's namespace like the quotas object that are created by maintain-kubeusers and transient things created by webservice. This would be more like a maintain-kubeusers managed thing than a webservice manged thing.

The thing to store/track is a relatively simple RoleBinding object that looks something like:

$ kubectl sudo get rolebinding/default-view -n tool-bd808-test -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: "2022-11-11T22:16:16Z"
  name: default-view
  namespace: tool-bd808-test
  resourceVersion: "929429243"
  uid: 610b4cb1-1001-42ec-9ebc-363e412eb8f6
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- kind: ServiceAccount
  name: default
  namespace: tool-bd808-test

Note that there's no stability or availability assurance for any of the k8s APIs (raw k8s APIs). I understand they are way more powerful than the APIs/abstractions that we do maintain on top of it, but we can´t offer any kind of assurance that your tools will not break, stop working or misbehave at any point (essentially, let there be dragons).

Provide something better that fits the requirements and I'll look at using it. Last I've heard there's nothing else at all available.

Provide something better that fits the requirements and I'll look at using it. Last I've heard there's nothing else at all available.

tl;dr; We are working on it :)

For all your toolforge-managed jobs and builds you can use those specific APIs directly (only direct calls for now, no clis supported yet T356377: [toolforge] simplify calling the different toolforge apis from within the containers), for webservices, we don't have yet an API, though we are working on it (ex. T352857: Toolforge next user stories - 2024 version, we will have more tasks created today hopefully).

For raw k8s stats/status, you can also use:
https://grafana-rw.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1&var-cluster=prometheus-tools&var-namespace=tool-anomiebot

Note that I'm not saying you can't do it, or even that you should not do it, just making sure that you are aware of the tradeoffs you are making.

dcaro changed the task status from Open to In Progress.May 22 2024, 9:11 AM
dcaro changed the task status from In Progress to Open.
dcaro triaged this task as High priority.

In case it was missed, the toolforge API is considered now stable and you can use it directly expecting it to not change in <60days since the changes are announced (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/API#Timeline_of_a_deprecation_change, also is swagger UI format https://api-docs.toolforge.org).

The only feature currently not available there that's available on the cli is webservices, that we are currently working on migrating to the APIs.

@Anomie that should help you get some of your scripts migrated if you want to use the API directly.

So, touching on this from the context of https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli
As of https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli/-/commit/4c7c36a9e1a2425c57f3a10c1a2f78f3d2fab863 there is a kubernetes curl command, similar to the curl command I recently added for the toolforge APIs, that allows for easy authenticated and possibly proxied custom requests, no matter where you are running the tool.

Calling this locally on my laptop for example

tf kubernetes curl --tool wikicrowd /api/v1 --json which gives me the json resposne. (I'm not sure which other API endpoints I can actually call / have access to from the tool account, but happy to try others).

And example use of this would be including the cli binary in pods, and adding out of the box authentication using the SA and token too, so that the experience becomes the same no matter where you are interacting with the API (local machine, on tools, or within a pod etc)
This is very much in line with what is seen with other CLIs such as the github CLI, which is provided and easily usable within githbu actions, code spaces, or locally on your dev machine etc.

It would likely also be trivialish to embed the entire kubernetes CLI within the binary with auto completed in the various environments if that were desired. (Similar to what is already dont with the gitlab CLI within https://www.mediawiki.org/wiki/Cli/ref/mw_gitlab)

It would likely also be trivialish to embed the entire kubernetes CLI within the binary with auto completed in the various environments if that were desired. (Similar to what is already dont with the gitlab CLI within https://www.mediawiki.org/wiki/Cli/ref/mw_gitlab)

I like this idea, and the semantics that it brings. I would be able to copy/paste kubectl commands from the internet and just prefix them with toolforge, and that feels like a nice thing to have_

$ toolforge kubectl get pods

I like this idea, and the semantics that it brings. I would be able to copy/paste kubectl commands from the internet and just prefix them with toolforge, and that feels like a nice thing to have_

$ toolforge kubectl get pods

Indeed, and while running such a command on a local dev machine you may have to specify --tool foo, but running it within a tool accout, or within a container owned by a tool account I imagine that to always be auto detectable.

I like this idea, and the semantics that it brings. I would be able to copy/paste kubectl commands from the internet and just prefix them with toolforge, and that feels like a nice thing to have_

$ toolforge kubectl get pods

Indeed, and while running such a command on a local dev machine you may have to specify --tool foo, but running it within a tool accout, or within a container owned by a tool account I imagine that to always be auto detectable.

I guess --tool foo would be an argument of the toolforge binary, no?, so like this:

$ toolforge --tool foo kubectl get pods

That feels elegant, and as you say, it can be guessed in other contexts.

I like this idea, and the semantics that it brings. I would be able to copy/paste kubectl commands from the internet and just prefix them with toolforge, and that feels like a nice thing to have_

Touching back here, as I took another pass at this today with the "new cli" / toolforge-gen-cli.
I managed to embed kubectl entirely, so you get all of the helptexts and autocompletions etc

Note: toolforge here is the binary from https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli, not the one actually currently on projects... (for others reading this out of context)

If you are already accessing toolforge or toolsbeta are your tool this can just become

toolforge kubectl get pods

On an environment such as toolforge and or toolsbeta, you can also do this if you are not on a tool account (not yet done on lima)

toolforge --tool toolforge-cli-test kubectl get pods

These commands all work from my local machine, via SSH proxy to the k8s API

toolforge --tool toolforge-cli-test kubectl get pods
toolforge --tool toolforge-cli-test --beta kubectl get pods
toolforge --tool tf-test --lima kubectl kubectl get pods

And within a container

tools.toolforge-cli-test@shell-1750184075:~$ toolforge kubectl get pods
NAME                     READY   STATUS             RESTARTS           AGE
echo1-56bcc9bf98-nczws   0/1     CrashLoopBackOff   9023 (2m38s ago)   32d
shell-1750184075         1/1     Running            0                  3m34s

Can that be split from the cli?
As in, make it optional like a plugin of sorts?

There's some reasons I ask for that:

  • Avoid fat clients, allow custom clients to do anything that's possible through API, if we push to have all supported flows doable through the API, we can enable those on any client/script
  • Avoid non-maintained flows, kubectl-ing directly is not a supported feature of toolforge, it's something you can do, but we don't give it the same support level as supported features (it might break at any point without notice, it's syntax might change, etc.)
  • Dependency on ssh. Soon-ish (:fingerscrossed:) we will stop using ssh for the toolforge API itself, and enable calling directly, this means that all the code/machinery to do ssh tunneling and related will not be needed anymore for any supported flow. If we keep this kubectl proxy feature there it means that we will have to keep all that too (and thus not be able to simplify).

Can that be split from the cli?

Yup.
It's fairly easy to compose and uncompose these parts as needed.

As in, make it optional like a plugin of sorts?

And a plugin like setup could also be done fairly easily.
(not currently in place)

There's some reasons I ask for that:

  • Avoid fat clients, allow custom clients to do anything that's possible through API, if we push to have all supported flows doable through the API, we can enable those on any client/script
  • Avoid non-maintained flows, kubectl-ing directly is not a supported feature of toolforge, it's something you can do, but we don't give it the same support level as supported features (it might break at any point without notice, it's syntax might change, etc.)
  • Dependency on ssh. Soon-ish (:fingerscrossed:) we will stop using ssh for the toolforge API itself, and enable calling directly, this means that all the code/machinery to do ssh tunneling and related will not be needed anymore for any supported flow. If we keep this kubectl proxy feature there it means that we will have to keep all that too (and thus not be able to simplify).

+1 to all of that

I hit the k8s api RBAC today with https://github.com/cluebotng/external-grafana-alloy

That build service image job is now mounting nfs just for the tools client certs to list the pods in the same namespace as the job is running.

Semantically this need goes away with T366923, but until then noting it here (previous static config doesn't work on review to multiple replica jobs in cluebot-review).

In case it was missed, the toolforge API is considered now stable and you can use it directly expecting it to not change in <60days since the changes are announced (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/API#Timeline_of_a_deprecation_change, also is swagger UI format https://api-docs.toolforge.org).

The only feature currently not available there that's available on the cli is webservices, that we are currently working on migrating to the APIs.

@Anomie that should help you get some of your scripts migrated if you want to use the API directly.

I finally got a chance to try looking at this. One thing I found missing is that the /jobs/v1/tool/anomiebot/jobs endpoint does not include the pod name, which I make use of in some of my reporting to determine when some data is stale.

In case it was missed, the toolforge API is considered now stable and you can use it directly expecting it to not change in <60days since the changes are announced (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/API#Timeline_of_a_deprecation_change, also is swagger UI format https://api-docs.toolforge.org).

The only feature currently not available there that's available on the cli is webservices, that we are currently working on migrating to the APIs.

@Anomie that should help you get some of your scripts migrated if you want to use the API directly.

I finally got a chance to try looking at this. One thing I found missing is that the /jobs/v1/tool/anomiebot/jobs endpoint does not include the pod name, which I make use of in some of my reporting to determine when some data is stale.

Can you elaborate on how do you use that information?
Ideally you should not need to interact with k8s directly at all, and instead interact with the toolforge api instead, so we might be missing some features.

In one script I use the information to display a message "job <name> is already running on <pod>".

In another report, the task writes status data to redis that includes the hostname (i.e. the pod name) where it's running, and the reporting matches that against the result from the API to see if the pod is still running or has gone missing. Matching only on the job name might match stale data with a running job on a different pod.

Occasionally I've used the pod name reported from either of these when trying to manually investigate something going wrong.

Hmm, I'm thinking on adding something like runtime_info to the job, so we can put some extra debugging info there, like the pod, the host and maybe the image/hash it runs on. It might not be reliable though, but might be enough for your case. I strongly encourage you to avoid depending on those values and find alternatives, for example:

In one script I use the information to display a message "job <name> is already running on <pod>".

This just can skip saying which pod it runs on :)

In another report, the task writes status data to redis that includes the hostname (i.e. the pod name) where it's running, and the reporting matches that against the result from the API to see if the pod is still running or has gone missing. Matching only on the job name might match stale data with a running job on a different pod.

I don't know the details of this, but you can try using generated 'run-id' of sorts, using a timestamp of liveness instead of the hostname (so if it was not updated in the last X time, it means it's stalled), a mixture of both, etc.

Occasionally I've used the pod name reported from either of these when trying to manually investigate something going wrong.

This is the only one that does not have many alternatives, you can still report the pod name/hostname/etc. on the logs of the job.

This just can skip saying which pod it runs on :)

Which then makes more work for me if I need to follow up on the message manually.

I don't know the details of this, but you can try using generated 'run-id' of sorts, using a timestamp of liveness instead of the hostname (so if it was not updated in the last X time, it means it's stalled), a mixture of both, etc.

Or I could keep doing what works if you add the necessary field.

It might not be reliable though

If your code thinks it knows what jobs are running without being able to know where they're doing so, I suspect you have bigger problems.

If your code thinks it knows what jobs are running without being able to know where they're doing so, I suspect you have bigger problems.

The key here is that the runtime (single cluster, multiworker, v1.29 k8s currently) might (and will) eventually change, expand, etc. so the only interfaces that we support are the toolforge APIs and clis.

Anything else related to the internal details of the runtime (ex. is it running in a pod? two pods? in a container? more than one container? does it have ens0 as the main interface? is it running in k8s? is it a VM? is it a single cluster? two? in eqiad? in codfw? is it k8s v1.21? v1.33? is it using a deployment? just a pod? a k8s job? a k8s cronjob? a NodePort? etc.) will change without backwards compatibility (we will always try to be gentle, but there's no assurance of any kind).

So my advice is, if you don't want to have your code break the next time that happens, to use the toolforge APIs and make your code not depend on the underlying runtime. This does not mean that you can't do it, or that you shouldn't, just that I recommend not doing it unless you want to invest the time to modify it when the runtime changes. It's a tradeoff that only you can decide to take or not. I'm trying to help avoid getting to the same position we are in now.