[21:28] < legoktm> maybe I'm being impatient, but it's taking longer than I expected for the "tour-nyc" tool I created to become... become-able [21:38] < bd808> legoktm: it usually takes 5-10 minutes, but there might be something wrong with maintain-kubeusers? [21:38] < legoktm> hm, we're at ~15min now [21:38] < bd808> yeah, that's a bit too long [21:39] < bd808> now I have to remember where this stuff runs... [21:39] < legoktm> I'm supposed to demo this live in uhh...an hour, but I'm just canabalizing another spare tool I have lying around for now [21:39] < legoktm> so no urgency specifically from me other than it is broken :p [21:45] < bd808> !log admin maintain-kubeusers container in CrashLoopBackoff, investigating
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
k8s_api: rbac: use 'policy' group | labs/tools/maintain-kubeusers | master | +1 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T327025 Upgrade Toolforge Kubernetes to version 1.26 | |||
Open | None | T316107 Upgrade Toolforge Kubernetes to version 1.25 | |||
Open | None | T307651 Upgrade Toolforge Kubernetes to version 1.24 | |||
Resolved | taavi | T298005 Upgrade Toolforge Kubernetes to version 1.23 | |||
Resolved | taavi | T286856 Upgrade Toolforge Kubernetes to latest 1.22 | |||
Resolved | BUG REPORT | taavi | T331572 maintain-kubeusers container in CrashLoopBackoff preventing new tool creation after 'user-maintainer' ClusterRole changes | ||
Resolved | taavi | T331619 toolforge: rbac: change existing roles to reference PSP in the policy group |
Event Timeline
$ kubectl sudo -n maintain-kubeusers logs maintain-kubeusers-b6c6d7c5c-kh75v starting a run Homedir already exists for /data/project/chtholly Wrote config in /data/project/chtholly/.kube/config PodSecurityPolicy tool-chtholly-psp already exists Namespace tool-chtholly already exists Role tool-chtholly-psp already exists Could not create toolforge-tfb-psp role for chtholly Traceback (most recent call last): File "/app/maintain_kubeusers.py", line 7, in <module> runpy.run_module("maintain_kubeusers", run_name="__main__") File "/usr/lib/python3.9/runpy.py", line 228, in run_module return _run_code(code, {}, init_globals, run_name, mod_spec) File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/app/maintain_kubeusers/__main__.py", line 7, in <module> main() File "/app/maintain_kubeusers/cli.py", line 162, in main new_tools = process_new_users( File "/app/maintain_kubeusers/utils.py", line 63, in process_new_users k8s_api.add_user_access(user_list[user_name]) File "/app/maintain_kubeusers/k8s_api.py", line 800, in add_user_access self.process_buildpack_rbac(user.name) File "/app/maintain_kubeusers/k8s_api.py", line 659, in process_buildpack_rbac _ = self.rbac.create_namespaced_role( File "/app/venv/lib/python3.9/site-packages/kubernetes/client/api/rbac_authorization_v1_api.py", line 324, in create_namespaced_role return self.create_namespaced_role_with_http_info(namespace, body, **kwargs) # noqa: E501 File "/app/venv/lib/python3.9/site-packages/kubernetes/client/api/rbac_authorization_v1_api.py", line 419, in create_namespaced_role_with_http_info return self.api_client.call_api( File "/app/venv/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, File "/app/venv/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( File "/app/venv/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 391, in request return self.rest_client.POST(url, File "/app/venv/lib/python3.9/site-packages/kubernetes/client/rest.py", line 275, in POST return self.request("POST", url, File "/app/venv/lib/python3.9/site-packages/kubernetes/client/rest.py", line 234, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'e76349e3-a0be-4217-9dea-10b63204e554', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'd77de175-725f-414a-9b4e-8b719e411c2c', 'Date': 'Wed, 08 Mar 2023 21:49:20 GMT', 'Content-Length': '627'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"roles.rbac.authorization.k8s.io \"tfb-chtholly-psp\" is forbidden: user \"system:serviceaccount:maintain-kubeusers:user-maintainer\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:maintain-kubeusers\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"extensions\"], Resources:[\"podsecuritypolicies\"], ResourceNames:[\"toolforge-tfb-psp\"], Verbs:[\"use\"]}","reason":"Forbidden","details":{"name":"tfb-chtholly-psp","group":"rbac.authorization.k8s.io","kind":"roles"},"code":403}
This seems to be related to T329869: Remove tool RBAC rules for APIs removed in Kubernetes 1.22 and T286856: Upgrade Toolforge Kubernetes to latest 1.22.
I think that rLTMK3e0402542836: rbac: Drop rules for deprecated APIs removed the podsecuritypolicies resource permissions that maintain-kubeusers is still trying to grant at https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/maintain-kubeusers/+/refs/heads/master/maintain_kubeusers/k8s_api.py#533.
Quick hack to get things working again:
$ kubectl sudo -n maintain-kubeusers edit ClusterRole user-maintainer # Add this back in: - apiGroups: - extensions resources: - podsecuritypolicies verbs: - create - get - list - patch - update - use - delete
$ kubectl sudo -n maintain-kubeusers logs -f maintain-kubeusers-b6c6d7c5c-kh75v starting a run Homedir already exists for /data/project/chtholly Wrote config in /data/project/chtholly/.kube/config PodSecurityPolicy tool-chtholly-psp already exists Namespace tool-chtholly already exists Role tool-chtholly-psp already exists Provisioned creds for user chtholly Homedir already exists for /data/project/del-simple-bad-redirects Wrote config in /data/project/del-simple-bad-redirects/.kube/config PodSecurityPolicy tool-del-simple-bad-redirects-psp already exists Namespace tool-del-simple-bad-redirects already exists Role tool-del-simple-bad-redirects-psp already exists Provisioned creds for user del-simple-bad-redirects Homedir already exists for /data/project/tour-nyc Wrote config in /data/project/tour-nyc/.kube/config PodSecurityPolicy tool-tour-nyc-psp already exists Namespace tool-tour-nyc already exists Role tool-tour-nyc-psp already exists Provisioned creds for user tour-nyc Homedir already exists for /data/project/ashleybot Wrote config in /data/project/ashleybot/.kube/config Renewed creds for tool ashleybot Homedir already exists for /data/project/terabot Wrote config in /data/project/terabot/.kube/config Renewed creds for tool terabot finished run, wrote 3 new accounts, disabled 2 accounts, cleaned up 0 accounts
Mentioned in SAL (#wikimedia-cloud) [2023-03-08T22:31:29Z] <bd808> Live hacked user-maintainer clusterrole to work around breakage in T331572
The service is working again because I live hacked the missing podsecuritypolicies grants back into the user-maintainer clusterrole, but this will break again as soon as someone runs the helm deploy for maintain-kubeusers.
Hopefully @taavi or @aborrero can figure out what code changes are needed in the maintain-kubeusers python code to match the podsecuritypolicies removal for the pending Kubernetes upgrade.
Sorry I did not notice that. Looks like it's the creation of the RBAC rule which fails, not the creation of the PSP itself.
Change 896019 had a related patch set uploaded (by Majavah; author: Majavah):
[labs/tools/maintain-kubeusers@master] k8s_api: rbac: use 'policy' group
Change 896019 merged by jenkins-bot:
[labs/tools/maintain-kubeusers@master] k8s_api: rbac: use 'policy' group
I can't explain how is possible the code was working before. PSP were in policy/v1beta1 in 1.21 https://v1-21.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#podsecuritypolicy-v1beta1-policy