Page MenuHomePhabricator

Tools with "_" in their name or names longer than 63 characters do not get Kubernetes namespaces created
Closed, ResolvedPublic

Description

Originally reported on irc.

$ webservice --backend=kubernetes python shell
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 166, in <module>
    job.shell()
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/kubernetesbackend.py", line 418, in shell
    pykube.Pod(self.api, podSpec).create()
  File "/usr/lib/python2.7/dist-packages/pykube/objects.py", line 76, in create
    self.api.raise_for_status(r)
  File "/usr/lib/python2.7/dist-packages/pykube/http.py", line 104, in raise_for_status
    raise HTTPError(payload["message"])
pykube.exceptions.HTTPError: namespaces "wdq_checker" not found
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 145, in apport_excepthook
    os.O_WRONLY | os.O_CREAT | os.O_EXCL, 0o640), 'wb') as f:
OSError: [Errno 2] No such file or directory: '/var/crash/_usr_bin_webservice.53499.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 166, in <module>
    job.shell()
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/kubernetesbackend.py", line 418, in shell
    pykube.Pod(self.api, podSpec).create()
  File "/usr/lib/python2.7/dist-packages/pykube/objects.py", line 76, in create
    self.api.raise_for_status(r)
  File "/usr/lib/python2.7/dist-packages/pykube/http.py", line 104, in raise_for_status
    raise HTTPError(payload["message"])
pykube.exceptions.HTTPError: namespaces "wdq_checker" not found

A /data/project/wdq_checker/.kube/config file exists, has correct permissions, and has a token for the namespace.

Event Timeline

There is also a matching token in /etc/kubernetes/tokenauth on tools-k8s-master-01 and entries for the user in /etc/kubernetes/abac.

$ journalctl -u maintain-kubeusers --no-pager | grep wdq_checker
Sep 15 17:42:55 tools-k8s-master-01 maintain-kubeusers[13111]: Wrote config in /data/project/wdq_checker/.kube/config
Sep 15 17:42:56 tools-k8s-master-01 maintain-kubeusers[13111]: (b'', b'The Namespace "wdq_checker" is invalid: metadata.name: Invalid value: "wdq_checker": must match the regex [a-z0-9]([-a-z0-9]*[a-z0-9])? (e.g. \'my-name\' or \'123-abc\')\n')
Sep 15 17:42:56 tools-k8s-master-01 maintain-kubeusers[13111]: Provisioned creds for tool wdq_checker

Kubernetes is rejecting namespaces that include the _ character.

it is possible to rename or delete and create new tool. But it is not correct, it will not repaired.

According to the Kubernetes source code, a Namespace must comply with the definition of a DNS Label:

// DNS_LABEL:  This is a string, no more than 63 characters long, that conforms
//     to the definition of a "label" in RFCs 1035 and 1123. This is captured
//     by the following regex:
//         [a-z0-9]([-a-z0-9]*[a-z0-9])?

From RFC 1035:

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

Our current rules for a tool name are not the same which is both surprising and unfortunate. We currently use rules based on an interpretation of the recommendations made by useradd(8) for tool names as coded in this regular expression: ^[a-z][a-z0-9_-]{0,31}$

There are two general ways I can see to "fix" this problem going forward:

  • Change our tool -> namespace mapping from the current identity mapping to some repeatable transformation (hash) that is encoded in a way that matches the RFC 1035 restrictions. This has a disadvantage of making it more difficult to reverse from a Kubernetes namespace to the owning tool.
  • Change the tool name restrictions to match the intersection of the useradd(8) and RFC 1035 restrictions. This would basically mean dropping _ as a valid character. This would not fix the problem currently being encountered by the wdq_checker tool, but it does seem to be the only tool that exists which has used the _ character.

@Frettie in the near term, I think you should create a new tool account, possibly wdq-checker to run your code.

@Frettie in the near term, I think you should create a new tool account, possibly wdq-checker to run your code.

I'm having this problem on two of my tools, one does not have _ in its name, but - (permission-denied-test), and the other one has a pure alphabetic name (whichsub):

$ webservice --backend=kubernetes python shell
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 198, in <module>
    job.shell()
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/kubernetesbackend.py", line 457, in shell
    pykube.Pod(self.api, podSpec).create()
  File "/usr/lib/python2.7/dist-packages/pykube/objects.py", line 76, in create
    self.api.raise_for_status(r)
  File "/usr/lib/python2.7/dist-packages/pykube/http.py", line 104, in raise_for_status
    raise HTTPError(payload["message"])
pykube.exceptions.HTTPError: namespaces "permission-denied-test" not found
$webservice --backend=kubernetes python shell
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 198, in <module>
    job.shell()
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/kubernetesbackend.py", line 457, in shell
    pykube.Pod(self.api, podSpec).create()
  File "/usr/lib/python2.7/dist-packages/pykube/objects.py", line 76, in create
    self.api.raise_for_status(r)
  File "/usr/lib/python2.7/dist-packages/pykube/http.py", line 104, in raise_for_status
    raise HTTPError(payload["message"])
pykube.exceptions.HTTPError: namespaces "whichsub" not found

Mentioned in SAL (#wikimedia-cloud) [2019-02-27T16:20:10Z] <zhuyifei1999_> regenerating k8s creds for tools.whichsub & tools.permission-denied-test T176027

04:21:44 0 ✓ zhuyifei1999@tools-k8s-master-01: ~$ sudo rm ~tools.whichsub/.kube/config ~tools.permission-denied-test/.kube/config
rm: cannot remove ‘/data/project/permission-denied-test/.kube/config’: Operation not permitted

o.O

Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: Homedir already exists for /data/project/whichsub
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: Wrote config in /data/project/whichsub/.kube/config
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: (b'namespace "whichsub" created\n', b'')
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: Provisioned creds for tool whichsub
(venv) tools.whichsub@tools-sgebastion-07:~$webservice --backend=kubernetes python shell
Defaulting container name to interactive.
Use 'kubectl describe pod/interactive -n whichsub' to see all of the containers in this pod.
If you don't see a command prompt, try pressing enter.
(venv) tools.whichsub@interactive:~$

I have no idea what is going on with tools.permission-denied-test

I'm having this problem on two of my tools, one does not have _ in its name, but - (permission-denied-test), and the other one has a pure alphabetic name (whichsub):

I have no idea what is going on with tools.permission-denied-test

I logged into the actual NFS master to try and figure this out. The directory listing for permission-denied-test/.kube/config did not show as having chattr +i set, but the behavior very much matched that. After doing a chattr -i to remove the (unseen) protection bit I was able to delete the file and the .kube directory.

After doing that stopped maintain-kubeusers on tools-k8s-master-01, removed the entry for "permission-denied-test" from /etc/kubernetes/tokenauth, and started maintain-kubeusers again:

Feb 27 22:00:09 tools-k8s-master-01 maintain-kubeusers[26838]: starting a run
Feb 27 22:00:09 tools-k8s-master-01 maintain-kubeusers[26838]: Homedir already exists for /data/project/permission-denied-test
Feb 27 22:00:09 tools-k8s-master-01 maintain-kubeusers[26838]: Wrote config in /data/project/permission-denied-test/.kube/config
Feb 27 22:00:09 tools-k8s-master-01 maintain-kubeusers[26838]: (b'namespace "permission-denied-test" created\n', b'')
Feb 27 22:00:09 tools-k8s-master-01 maintain-kubeusers[26838]: Provisioned creds for tool permission-denied-test
Feb 27 22:00:20 tools-k8s-master-01 maintain-kubeusers[26838]: finished run, wrote 1 new accounts
$ sudo become permission-denied-test
$ webservice --backend=kubernetes php7.2 shell
Defaulting container name to interactive.
Use 'kubectl describe pod/interactive -n permission-denied-test' to see all of the containers in this pod.
If you don't see a command prompt, try pressing enter.
$

@Dalba, both of your reported tools should be fixed now.

bd808 renamed this task from namespaces "wdq_checker" not found error when trying to start webservice shell to Tools with "_" in their name or names longer than 63 characters do not get Kubernetes namespaces created.Feb 27 2019, 10:07 PM
bd808 claimed this task.

Marking as "resolved" per the fix made in T176681: Striker should not allow tool names to include '_' for Kubernetes compatibility which should prevent new tools from being created which violate the naming constraint for Kubernetes namespaces.

The fixes made for the subsequent report in T176027#4988213 are unrelated to the core issue of this particular bug.

Change 577391 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] users: filter out any invalid tool names

https://gerrit.wikimedia.org/r/577391

Change 577391 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] users: filter out any invalid tool names

https://gerrit.wikimedia.org/r/577391