Steps to reproduce:
- become a tool
- run webservice --backend=kubernetes nodejs shell as recommended in the doc
- ls /mnt/nfs => ls: cannot access /mnt/nfs: No such file or directory
Steps to reproduce:
The dump is mounted on the host:
root@tools-worker-1001:~# mount | grep dump labstore1006.wikimedia.org:/dumps on /mnt/nfs/dumps-labstore1006.wikimedia.org type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=10.68.23.55,local_lock=none,addr=208.80.154.7) labstore1007.wikimedia.org:/dumps on /mnt/nfs/dumps-labstore1007.wikimedia.org type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=10.68.23.55,local_lock=none,addr=208.80.155.106)
/public/dumps/ exists on instance-local storage and consists purely of symlinks:
tools.zhuyifei1999-test@interactive:~$ mount | grep dump /dev/vda3 on /public/dumps type ext4 (rw,relatime,data=ordered)
root@tools-worker-1001:~# ls /public/dumps/ -l total 16 lrwxrwxrwx 1 root root 59 Apr 2 16:51 incr -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/incr lrwxrwxrwx 1 root root 88 Apr 2 16:51 pagecounts-all-sites -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/other/pagecounts-all-sites lrwxrwxrwx 1 root root 82 Apr 2 16:51 pagecounts-raw -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/other/pagecounts-raw lrwxrwxrwx 1 root root 77 Apr 2 16:51 pageviews -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public/other/pageviews lrwxrwxrwx 1 root root 61 Apr 2 16:51 public -> /mnt/nfs/dumps-labstore1006.wikimedia.org/xmldatadumps/public
One (possibly hacky) way to resolve this would be to use the webservice script mount the targets of the symlinks as well...
I think this may be an unintended side effect of the changes that have been made to support the new dumps storage servers. On the underlying Toolforge instances (e.g. tools-worker-1005.tools.eqiad.wmflabs), the /public/dumps directory is now a collection of symlinks to the active NFS mounts under /mnt/nfs. The Kubernetes configuration that we generate for each pod contains volume mounts from the local host that you can see with kubectl describe po/<name of pod>. The relevant portion will look something like:
Volume Mounts: /data/project/ from home (rw) /data/scratch/ from scratch (rw) /etc/ldap.conf from etcldap-conf-7c618 (rw) /etc/ldap.yaml from etcldap-yaml-4i1w3 (rw) /etc/novaobserver.yaml from etcnovaobserver-yaml-58q9h (rw) /public/dumps/ from dumps (rw) /var/run/nslcd/socket from varrunnslcdsocket-5241p (rw)
In the past, /public/dumps/ was a direct NFS mount so this worked. Now I guess we need to also mount the host's /mnt/nfs or subdirectories.
Change 491397 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/software/tools-webservice@master] Mount /mnt/nfs into Kuberntes pods
Change 491397 merged by jenkins-bot:
[operations/software/tools-webservice@master] Mount /mnt/nfs into Kuberntes pods
Mentioned in SAL (#wikimedia-cloud) [2019-02-20T23:17:29Z] <zhuyifei1999_> begin build new tools-webservice package T178601 T193646 T215683
Mentioned in SAL (#wikimedia-cloud) [2019-02-20T23:30:52Z] <zhuyifei1999_> begin rebuilding all docker images T178601 T193646 T215683
Change 491877 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Allow pods to mount /mnt/nfs
Change 491877 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Allow pods to mount /mnt/nfs
Fixed! Running pods will need to be restarted to see the mount.
$ webservice --backend=kubernetes shell Defaulting container name to interactive. Use 'kubectl describe pod/interactive -n bd808-test2' to see all of the containers in this pod. If you don't see a command prompt, try pressing enter. $ ls /mnt/nfs dumps-labstore1006.wikimedia.org labstore-secondary-tools-home dumps-labstore1007.wikimedia.org labstore-secondary-tools-project labstore1003-scratch