Page MenuHomePhabricator

Can't access /mnt from kubernetes nodejs shell
Closed, ResolvedPublic

Description

Steps to reproduce:

  • become a tool
  • run webservice --backend=kubernetes nodejs shell as recommended in the doc
  • ls /mnt/nfs => ls: cannot access /mnt/nfs: No such file or directory

Event Timeline

List of mounts: https://github.com/wikimedia/operations-software-tools-webservice/blob/7d7df1bbdd2c7637fc70e30f398859ca7131d3c0/toollabs/webservice/backends/kubernetesbackend.py#L345

The dump is mounted on the host:

root@tools-worker-1001:~# mount | grep dump
labstore1006.wikimedia.org:/dumps on /mnt/nfs/dumps-labstore1006.wikimedia.org type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=10.68.23.55,local_lock=none,addr=208.80.154.7)
labstore1007.wikimedia.org:/dumps on /mnt/nfs/dumps-labstore1007.wikimedia.org type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=10.68.23.55,local_lock=none,addr=208.80.155.106)

/public/dumps/ exists on instance-local storage and consists purely of symlinks:

tools.zhuyifei1999-test@interactive:~$ mount | grep dump
/dev/vda3 on /public/dumps type ext4 (rw,relatime,data=ordered)
root@tools-worker-1001:~# ls /public/dumps/ -l
total 16
lrwxrwxrwx 1 root root 59 Apr  2 16:51 incr -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/incr
lrwxrwxrwx 1 root root 88 Apr  2 16:51 pagecounts-all-sites -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/other/pagecounts-all-sites
lrwxrwxrwx 1 root root 82 Apr  2 16:51 pagecounts-raw -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/other/pagecounts-raw
lrwxrwxrwx 1 root root 77 Apr  2 16:51 pageviews -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/other/pageviews
lrwxrwxrwx 1 root root 61 Apr  2 16:51 public -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public

One (possibly hacky) way to resolve this would be to use the webservice script mount the targets of the symlinks as well...

bd808 subscribed.

I think this may be an unintended side effect of the changes that have been made to support the new dumps storage servers. On the underlying Toolforge instances (e.g. tools-worker-1005.tools.eqiad.wmflabs), the /public/dumps directory is now a collection of symlinks to the active NFS mounts under /mnt/nfs. The Kubernetes configuration that we generate for each pod contains volume mounts from the local host that you can see with kubectl describe po/<name of pod>. The relevant portion will look something like:

Volume Mounts:
  /data/project/ from home (rw)
  /data/scratch/ from scratch (rw)
  /etc/ldap.conf from etcldap-conf-7c618 (rw)
  /etc/ldap.yaml from etcldap-yaml-4i1w3 (rw)
  /etc/novaobserver.yaml from etcnovaobserver-yaml-58q9h (rw)
  /public/dumps/ from dumps (rw)
  /var/run/nslcd/socket from varrunnslcdsocket-5241p (rw)

In the past, /public/dumps/ was a direct NFS mount so this worked. Now I guess we need to also mount the host's /mnt/nfs or subdirectories.

Vvjjkkii renamed this task from Can't access /mnt from kubernetes nodejs shell to 9rdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot raised the priority of this task from High to Needs Triage.Jul 3 2018, 1:59 AM

Change 491397 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/software/tools-webservice@master] Mount /mnt/nfs into Kuberntes pods

https://gerrit.wikimedia.org/r/491397

Change 491397 merged by jenkins-bot:
[operations/software/tools-webservice@master] Mount /mnt/nfs into Kuberntes pods

https://gerrit.wikimedia.org/r/491397

Mentioned in SAL (#wikimedia-cloud) [2019-02-20T23:30:52Z] <zhuyifei1999_> begin rebuilding all docker images T178601 T193646 T215683

Change 491877 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Allow pods to mount /mnt/nfs

https://gerrit.wikimedia.org/r/491877

Change 491877 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Allow pods to mount /mnt/nfs

https://gerrit.wikimedia.org/r/491877

Fixed! Running pods will need to be restarted to see the mount.

$ webservice --backend=kubernetes shell
Defaulting container name to interactive.
Use 'kubectl describe pod/interactive -n bd808-test2' to see all of the containers in this pod.
If you don't see a command prompt, try pressing enter.
$ ls /mnt/nfs
dumps-labstore1006.wikimedia.org  labstore-secondary-tools-home
dumps-labstore1007.wikimedia.org  labstore-secondary-tools-project
labstore1003-scratch