Page MenuHomePhabricator

"teg" tool needs a higher Services quota to migrate to 2020 Kubernetes cluster
Closed, ResolvedPublic

Description

https://tools.wmflabs.org/admin/tool/teg seems to be using a frontend managed using webservice and a backend from a custom Deployment running a Java service.

The webservice migrate process moved the frontend to the 2020 Kubernetes cluster with no issues. Manually attempting to move the custom deployment as found in the tool's $HOME/backend.yml file showed a partial failure due to the quota for Service objects:

$ /usr/bin/kubectl create --validate=true -f backend.yml
deployment.extensions/teg-backend created
Error from server (Forbidden): error when creating "backend.yml": services "teg-backend" is forbidden: exceeded quota: tool-teg, requested: services=1, used: services=1, limited: services=1

The tool seems to have a number of other issues as well, including a non-functional $HOME/.lighttpd.conf:

$ kubectl logs -f teg-6b4f8669c8-b5ndj
Undefined env variable: TEG_BACKEND_SERVICE_HOST
2020-02-29 22:24:28: (configfile.c.1154) source: /var/run/lighttpd/teg line: 615 pos: 19 parser failed somehow near here: (COMMA)

Event Timeline

bd808 created this task.Feb 29 2020, 10:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 29 2020, 10:30 PM

Mentioned in SAL (#wikimedia-cloud) [2020-02-29T22:32:23Z] <wm-bot> <root> Stopped php7.2 webservice stuck in CrashLoopBackOff due to a syntaxically invalid /data/project/teg/.lighttpd.conf file (T246553)

Mentioned in SAL (#wikimedia-cloud) [2020-02-29T22:34:01Z] <wm-bot> <root> Deleted partially applied /data/project/teg/backend.yml Kubernetes deployment on 2020 Kubernetes cluster (T246553)

Mentioned in SAL (#wikimedia-cloud) [2020-02-29T22:35:26Z] <wm-bot> <root> Deleted /data/project/teg/backend.yml Kubernetes deployment on legacy cluster (T246553)

For what it's worth, the $HOME/.lightttpd.conf works as expected when the environment variables TEG_BACKEND_SERVICE_HOST and TEG_BACKEND_SERVICE_PORT are set. Up to now, these were set automatically from the backend deployment, so it's not a separate issue, but indeed caused by the backend failure.

bd808 added a subscriber: Bstorm.Mar 1 2020, 8:32 PM

@Bstorm I remember talking with you about these resource limits and how we would provide per-tool quota changes, but I don't remember how to actually do it. :)

Bstorm added a comment.Mar 1 2020, 8:40 PM

I didn't document it, unfortunately! You can edit the quota for the namespace using cluster admin https://kubernetes.io/docs/concepts/policy/resource-quotas/#object-count-quota

It'd probably work fine to just do a kubectl edit resourcequota -n tool-$tool $quotaname You may not even need to get the quotaname since there's only going to be one in each namespace.

Bstorm added a comment.Mar 1 2020, 8:43 PM

The quota is named after the namespace (in this case tool-teg).
kubectl get resourcequota -n tool-teg -o yaml will show it to you. I'll change it now.

Bstorm added a comment.Mar 1 2020, 8:44 PM
root@tools-k8s-control-1:~# kubectl edit resourcequota -n tool-teg tool-teg
resourcequota/tool-teg edited

That worked.

Mentioned in SAL (#wikimedia-cloud) [2020-03-01T20:45:52Z] <bstorm_> increased services quota to 2 for k8s T246553

Mentioned in SAL (#wikimedia-cloud) [2020-03-01T20:48:38Z] <bstorm_> running kubectl apply -f backend.yml T246553

Mentioned in SAL (#wikimedia-cloud) [2020-03-01T20:49:37Z] <bstorm_> starting php7.2 webservice T246553

Bstorm added a comment.Mar 1 2020, 8:51 PM

Ok, I did those in the wrong order, clearly

Bstorm added a comment.Mar 1 2020, 8:53 PM
51s         Warning   FailedCreate        replicaset/teg-backend-649f697c88   Error creating: pods "teg-backend-649f697c88-2kpkk" is forbidden: maximum cpu usage per Container is 1, but limit is 2

Ah, no, the problem is that the backend is greedier than I thought.

Mmarx added a comment.Mar 1 2020, 8:56 PM

I have reduced the number of CPUs requested to 1, and everything is working again now.

Mentioned in SAL (#wikimedia-cloud) [2020-03-01T20:58:35Z] <bstorm_> set namespace resourcequota for cpu to 2.5 T246553

Bstorm added a comment.Mar 1 2020, 8:59 PM

Ah, I also increased your quota a bit. Either way, that works!

tools.teg@tools-sgebastion-08:~$ kubectl get all
NAME                               READY   STATUS    RESTARTS   AGE
pod/teg-6b4f8669c8-rq8mr           1/1     Running   0          3m38s
pod/teg-backend-854584765b-kkjpv   1/1     Running   0          3m57s


NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/teg           ClusterIP   10.102.199.138   <none>        8000/TCP   7m
service/teg-backend   ClusterIP   10.109.158.132   <none>        4223/TCP   6m38s


NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/teg           1/1     1            1           7m
deployment.apps/teg-backend   1/1     1            1           7m15s

NAME                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/teg-6b4f8669c8           1         1         1       7m
replicaset.apps/teg-backend-854584765b   1         1         1       3m57s
Mmarx closed this task as Resolved.Mar 1 2020, 8:59 PM
Mmarx claimed this task.

Thanks!

bd808 reassigned this task from Mmarx to Bstorm.Mar 2 2020, 6:01 PM