Maniphest T321919

Figure out and document how to call the Kubernetes API as your tool user from inside a pod
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	bd808
	Oct 28 2022, 4:40 PM

Description

Anomiebot's status page is a good use case for being able to find out what pods are running in a tool's namespace from inside of a pod in the namespace.

The info at https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/ leads to the same problem documented at https://stackoverflow.com/questions/48311683/how-can-i-use-curl-to-access-the-kubernetes-api-from-within-a-pod. The issue is that the default serviceaccount credentials mounted into the pod do not have RBAC access to the API.

We have an ability to setup a special service account for any given tool which allows read-only access to all tenant namespaces. This is used by the k8s-status tool and documented at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Maintenance#wmcs-k8s-enable-cluster-monitor. One downside of this method is that it requires using a custom Deployment rather than just webservice start to attach the credentials to the pod.

https://kubernetes.io/docs/reference/access-authn-authz/rbac/#service-account-permissions explains various ways that the default service account for a tool could be changed so that it can access the api.

Also, because we mount $HOME into the pod, it should be possible to use the tool's x509 certificate credentials from $HOME/.toolskube to get an auth token.

Document how to use the credentials from $HOME/.toolskube
Document how an admin could grant read-only API access to the default service account for a tool
Document how to request that your tool's default service account be granted read-only API access

Related Objects
Search...

Status	Assigned	Task
Open	None	T321919 Figure out and document how to call the Kubernetes API as your tool user from inside a pod
Open	dcaro	T356377 [toolforge] simplify calling the different toolforge apis from within the containers
Open	None	T356262 [jobs-cli,builds-cli,toolforge-cli,webservice] Consolidate the Toolforge CLIs
In Progress	Slst2020	T356261 [toolforge-cli,jobs-cli,builds-cli,envvars-cli] Explore OpenAPI SDK tooling for client consolidation
Open	Slst2020	T354745 [jobs-api,buildservice-api,envvars-api] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client
Resolved	Slst2020	T358100 [toolforge API] expose all backend APIs OpenAPI specs
Resolved	aborrero	T356523 [jobs-api] introduce OpenAPI to jobs framework
In Progress	Slst2020	T362299 [api-gateway] Add a python server to serve consolidated openapi docs
In Progress	Raymond_Ndibe	T356974 [builds-api,jobs-api,envvars-api,api-gateway] Figure out and document how to do non-backwards compatible changes

Event Timeline

bd808 created this task.Oct 28 2022, 4:40 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 28 2022, 4:40 PM

bd808 moved this task from Backlog to New Technical Documentation Request on the Documentation board.Oct 28 2022, 4:45 PM

bd808 mentioned this in T319557: Migrate anomiebot from Toolforge GridEngine to Toolforge Kubernetes.Oct 28 2022, 8:24 PM

Document how an admin could grant read-only API access to the default service account for a tool
Document how to request that your tool's default service account be granted read-only API access

Is any of that actually needed? I have read+write access to my tool's namespace from within a pod, though no one has granted any special access AFAIK.

In T321919#8365398, @SD0001 wrote:

I have read+write access to my tool's namespace from within a pod, though no one has granted any special access AFAIK.

Are you doing that by authenticating using your tool's credentials from $HOME/.toolskube?

The steps about default service accounts would be designed to make it easier to get read-only access by using the service account's token that is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token in each Container.

Accessing the Kubernetes API from inside of a container using the default service account credentials:

$ APISERVER=https://kubernetes.default.svc
$ SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
$ CA_CERT=${SERVICEACCOUNT}/ca.crt
$ TOKEN=$(cat ${SERVICEACCOUNT}/token)
$ NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
$ curl --cacert $CA_CERT -H "Authorization: Bearer $TOKEN" "${APISERVER}"
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:serviceaccount:tool-bd808-test:default\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

Because the tool's $HOME is mounted inside of our container, we can authenticate using the x509 certificate from $HOME/.toolskube.

$ APISERVER=https://kubernetes.default.svc
$ SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
$ CA_CERT=${SERVICEACCOUNT}/ca.crt
$ NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
$ CERT=$HOME/.toolskube/client.crt
$ KEY=$HOME/.toolskube/client.key
$ curl --silent --cacert $CA_CERT --key $KEY --cert $CERT "${APISERVER}/api/v1/namespaces/$NAMESPACE/pods/" |
  jq -r ".items[] | [.metadata.name, .status.phase] | @tsv"
bd808-test-77b666f66f-z87pw     Running
shell-1668200251        Running

A Toolforge admin can grant "view" rights to the default service account for a given tool:

$ kubectl sudo create rolebinding default-view \
      --clusterrole=view \
      --serviceaccount=tool-bd808-test:default \
      --namespace=tool-bd808-test
rolebinding.rbac.authorization.k8s.io/default-view created

With this rolebinding in place, the default service account can query for running pods:

$ APISERVER=https://kubernetes.default.svc
$ SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
$ CA_CERT=${SERVICEACCOUNT}/ca.crt
$ TOKEN=$(cat ${SERVICEACCOUNT}/token)
$ NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
$ curl --silent --cacert $CA_CERT -H "Authorization: Bearer $TOKEN" "${APISERVER}/api/v1/namespaces/$NAMESPACE/pods/" |
jq -r ".items[] | [.metadata.name, .status.phase] | @tsv"
bd808-test-77b666f66f-z87pw     Running
shell-1668205058        Running

taavi@tools-sgebastion-11:~ $ k sudo desc clusterrole view
Name:         view
Labels:       kubernetes.io/bootstrapping=rbac-defaults
              rbac.authorization.k8s.io/aggregate-to-edit=true
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources                                    Non-Resource URLs  Resource Names  Verbs
  ---------                                    -----------------  --------------  -----
  bindings                                     []                 []              [get list watch]
  configmaps                                   []                 []              [get list watch]
  endpoints                                    []                 []              [get list watch]
  events                                       []                 []              [get list watch]
  limitranges                                  []                 []              [get list watch]
  namespaces/status                            []                 []              [get list watch]
  namespaces                                   []                 []              [get list watch]
  persistentvolumeclaims/status                []                 []              [get list watch]
  persistentvolumeclaims                       []                 []              [get list watch]
  pods/log                                     []                 []              [get list watch]
  pods/status                                  []                 []              [get list watch]
  pods                                         []                 []              [get list watch]
  replicationcontrollers/scale                 []                 []              [get list watch]
  replicationcontrollers/status                []                 []              [get list watch]
  replicationcontrollers                       []                 []              [get list watch]
  resourcequotas/status                        []                 []              [get list watch]
  resourcequotas                               []                 []              [get list watch]
  serviceaccounts                              []                 []              [get list watch]
  services/status                              []                 []              [get list watch]
  services                                     []                 []              [get list watch]
  controllerrevisions.apps                     []                 []              [get list watch]
  daemonsets.apps/status                       []                 []              [get list watch]
  daemonsets.apps                              []                 []              [get list watch]
  deployments.apps/scale                       []                 []              [get list watch]
  deployments.apps/status                      []                 []              [get list watch]
  deployments.apps                             []                 []              [get list watch]
  replicasets.apps/scale                       []                 []              [get list watch]
  replicasets.apps/status                      []                 []              [get list watch]
  replicasets.apps                             []                 []              [get list watch]
  statefulsets.apps/scale                      []                 []              [get list watch]
  statefulsets.apps/status                     []                 []              [get list watch]
  statefulsets.apps                            []                 []              [get list watch]
  horizontalpodautoscalers.autoscaling/status  []                 []              [get list watch]
  horizontalpodautoscalers.autoscaling         []                 []              [get list watch]
  cronjobs.batch/status                        []                 []              [get list watch]
  cronjobs.batch                               []                 []              [get list watch]
  jobs.batch/status                            []                 []              [get list watch]
  jobs.batch                                   []                 []              [get list watch]
  daemonsets.extensions/status                 []                 []              [get list watch]
  daemonsets.extensions                        []                 []              [get list watch]
  deployments.extensions/scale                 []                 []              [get list watch]
  deployments.extensions/status                []                 []              [get list watch]
  deployments.extensions                       []                 []              [get list watch]
  ingresses.extensions/status                  []                 []              [get list watch]
  ingresses.extensions                         []                 []              [get list watch]
  networkpolicies.extensions                   []                 []              [get list watch]
  replicasets.extensions/scale                 []                 []              [get list watch]
  replicasets.extensions/status                []                 []              [get list watch]
  replicasets.extensions                       []                 []              [get list watch]
  replicationcontrollers.extensions/scale      []                 []              [get list watch]
  nodes.metrics.k8s.io                         []                 []              [get list watch]
  pods.metrics.k8s.io                          []                 []              [get list watch]
  ingresses.networking.k8s.io/status           []                 []              [get list watch]
  ingresses.networking.k8s.io                  []                 []              [get list watch]
  networkpolicies.networking.k8s.io            []                 []              [get list watch]
  poddisruptionbudgets.policy/status           []                 []              [get list watch]
  poddisruptionbudgets.policy                  []                 []              [get list watch]

I don't see anything particularly sensitive on that list, except maybe pods/log. Most notably it doesn't grant access to secrets.

I think it should be OK to grant tool accounts this access to their own namespaces, but I'd like to have it managed with via something that's kept in version control instead of adding ad hoc objects to the Kubernetes cluster.

In T321919#8401529, @taavi wrote:

I think it should be OK to grant tool accounts this access to their own namespaces, but I'd like to have it managed with via something that's kept in version control instead of adding ad hoc objects to the Kubernetes cluster.

This sounds like a reasonable idea, but I'm not aware of any tool account namespace objects that are current handled this way. Today there are durable things in each tool's namespace like the quotas object that are created by maintain-kubeusers and transient things created by webservice. This would be more like a maintain-kubeusers managed thing than a webservice manged thing.

The thing to store/track is a relatively simple RoleBinding object that looks something like:

$ kubectl sudo get rolebinding/default-view -n tool-bd808-test -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: "2022-11-11T22:16:16Z"
  name: default-view
  namespace: tool-bd808-test
  resourceVersion: "929429243"
  uid: 610b4cb1-1001-42ec-9ebc-363e412eb8f6
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- kind: ServiceAccount
  name: default
  namespace: tool-bd808-test

fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 6:29 PM

fnegri moved this task from Kanban to Inbox on the cloud-services-team board.

bd808 mentioned this in T332290: [backend] Set up auto-updates on Toolforge.Mar 16 2023, 8:32 PM

bd808 mentioned this in T356377: [toolforge] simplify calling the different toolforge apis from within the containers.Feb 1 2024, 4:02 PM

LucasWerkmeister subscribed.Feb 1 2024, 10:43 PM

bd808 mentioned this in T357172: Tool user not allowed to read jobs/status in Kubernetes.Feb 9 2024, 7:21 PM

Note that there's no stability or availability assurance for any of the k8s APIs (raw k8s APIs). I understand they are way more powerful than the APIs/abstractions that we do maintain on top of it, but we can´t offer any kind of assurance that your tools will not break, stop working or misbehave at any point (essentially, let there be dragons).

Provide something better that fits the requirements and I'll look at using it. Last I've heard there's nothing else at all available.

In T321919#9533573, @Anomie wrote:

Provide something better that fits the requirements and I'll look at using it. Last I've heard there's nothing else at all available.

tl;dr; We are working on it :)

For all your toolforge-managed jobs and builds you can use those specific APIs directly (only direct calls for now, no clis supported yet T356377: [toolforge] simplify calling the different toolforge apis from within the containers), for webservices, we don't have yet an API, though we are working on it (ex. T352857: Toolforge next user stories - 2024 version, we will have more tasks created today hopefully).

For raw k8s stats/status, you can also use:
https://grafana-rw.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1&var-cluster=prometheus-tools&var-namespace=tool-anomiebot

Note that I'm not saying you can't do it, or even that you should not do it, just making sure that you are aware of the tradeoffs you are making.

dcaro added a subtask: T356377: [toolforge] simplify calling the different toolforge apis from within the containers.Feb 19 2024, 5:26 PM

dcaro added a subtask: T356974: [builds-api,jobs-api,envvars-api,api-gateway] Figure out and document how to do non-backwards compatible changes.Feb 26 2024, 11:45 AM

dcaro changed the status of subtask T356974: [builds-api,jobs-api,envvars-api,api-gateway] Figure out and document how to do non-backwards compatible changes from Open to In Progress.Mar 5 2024, 9:35 AM

dcaro mentioned this in T306391: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime.Mar 5 2024, 1:46 PM

Anomie mentioned this in T360488: Missing Perl packages on dev.toolforge.org for anomiebot workflows.Fri, Mar 22, 4:12 PM