Page MenuHomePhabricator

Disable NFS home directories on deployment-prep
Closed, ResolvedPublic

Description

Consider removing NFS home directories from deployment-prep. This will improve stability (lesser 'can not log in when NFS is dead') problems, and make it match prod closer (which has no NFS home directories). You will also get beers from Yuvi Panda

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added subscribers: yuvipanda, thcipriani, bd808.

Is there a good way to distribute dotfiles to the beta cluster hosts if we drop shared NFS? I guess this would be an incentive for me to finally get mine cleaned up and on github if not.

Yes, I think putting them in git is the best way to go about this. A labs specific solution is probably needed since a lot of people will want this. Perhaps a git repo you can associate with your account on wikitech or something and it gets auto cloned on every project you're a member of? That's actually probably crazy. Someting in puppet similar to the admin module perhaps.

Needs a separate bug though. Do you consider this a blocker for disabling home NFS on deployment-prep?

Yes, I think putting them in git is the best way to go about this. A labs specific solution is probably needed since a lot of people will want this. Perhaps a git repo you can associate with your account on wikitech or something and it gets auto cloned on every project you're a member of? That's actually probably crazy. Someting in puppet similar to the admin module perhaps.

Fancy! Probably/maybe too fancy but it would be pretty cool.

Needs a separate bug though. Do you consider this a blocker for disabling home NFS on deployment-prep?

Not a blocker for me, just a bit of pain. I already have a solution to manage this on the prod cluster (tarball I unpack).

This was scheduled for next friday I think.

Jenkins slaves on beta cluster use the jenkins-deploy user which has its home on the instance extended disk (/mnt) and thus /mnt/home/jenkins-deploy/workspace/ .

So there should be no impact on the Jenkins jobs. YMMV.

yuvipanda claimed this task.

That happened during the NFS outage.

hashar reopened this task as Open.EditedJun 22 2015, 10:10 AM

Reopening, some instances apparently still rely on NFS because puppet does not run properly :(

An example is deployment-zookeeper01.

This is just the homedirs, and they are gone.

Not yet.

hashar@deployment-salt:~$ sudo salt '*' cmd.run 'mount|grep /home'
deployment-fluorine.deployment-prep.eqiad.wmflabs:
deployment-sca02.deployment-prep.eqiad.wmflabs:
deployment-logstash2.deployment-prep.eqiad.wmflabs:
deployment-db2.deployment-prep.eqiad.wmflabs:
deployment-sentry2.deployment-prep.eqiad.wmflabs:
deployment-memc02.deployment-prep.eqiad.wmflabs:
deployment-logstash1.deployment-prep.eqiad.wmflabs:
deployment-memc03.deployment-prep.eqiad.wmflabs:
deployment-zotero01.deployment-prep.eqiad.wmflabs:
i-000002de.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home on /home type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc)
deployment-mediawiki01.deployment-prep.eqiad.wmflabs:
deployment-cache-mobile03.deployment-prep.eqiad.wmflabs:
deployment-elastic07.deployment-prep.eqiad.wmflabs:
deployment-urldownloader.deployment-prep.eqiad.wmflabs:
deployment-jobrunner01.deployment-prep.eqiad.wmflabs:
deployment-elastic08.deployment-prep.eqiad.wmflabs:
deployment-mx.deployment-prep.eqiad.wmflabs:
deployment-restbase01.deployment-prep.eqiad.wmflabs:
deployment-stream.deployment-prep.eqiad.wmflabs:
deployment-apertium01.deployment-prep.eqiad.wmflabs:
deployment-db1.deployment-prep.eqiad.wmflabs:
deployment-redis01.deployment-prep.eqiad.wmflabs:
deployment-mediawiki03.deployment-prep.eqiad.wmflabs:
i-00000958.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home on /home type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc)
deployment-kafka02.deployment-prep.eqiad.wmflabs:
deployment-elastic05.deployment-prep.eqiad.wmflabs:
i-000008d5.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home on /home type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc)
deployment-upload.deployment-prep.eqiad.wmflabs:
deployment-elastic06.deployment-prep.eqiad.wmflabs:
deployment-redis02.deployment-prep.eqiad.wmflabs:
deployment-cxserver03.deployment-prep.eqiad.wmflabs:
    tmpfs on /mnt/home/jenkins-deploy/tmpfs type tmpfs (rw,noatime,size=512M,mode=1777)
deployment-memc04.deployment-prep.eqiad.wmflabs:
deployment-mediawiki02.deployment-prep.eqiad.wmflabs:
hashar@deployment-salt:~$

And a couple instances have not been migrated to new DNS fqdn :-/

I have fixed DNS on the i-** instances.

  • deployment-cache-upload02 fixed up (dns/puppet/certs etc)
  • deployment-zookeeper01 no more has the /home NFS dir after a reboot

Still has a bunch:

^[[Aroot@deployment-salt:~# salt '*' cmd.run 'grep /home /etc/fstab|egrep ^labstore'
deployment-fluorine.deployment-prep.eqiad.wmflabs:
deployment-sca02.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-logstash2.deployment-prep.eqiad.wmflabs:
deployment-mediawiki01.deployment-prep.eqiad.wmflabs:
deployment-restbase01.deployment-prep.eqiad.wmflabs:
deployment-parsoid05.deployment-prep.eqiad.wmflabs:
deployment-stream.deployment-prep.eqiad.wmflabs:
deployment-jobrunner01.deployment-prep.eqiad.wmflabs:
deployment-db1.deployment-prep.eqiad.wmflabs:
deployment-elastic05.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-redis02.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-parsoidcache02.deployment-prep.eqiad.wmflabs:
deployment-memc03.deployment-prep.eqiad.wmflabs:
deployment-mediawiki02.deployment-prep.eqiad.wmflabs:
deployment-memc04.deployment-prep.eqiad.wmflabs:
deployment-db2.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-test.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-cache-bits01.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-kafka02.deployment-prep.eqiad.wmflabs:
deployment-bastion.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-mediawiki03.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-sca01.deployment-prep.eqiad.wmflabs:
deployment-pdf02.deployment-prep.eqiad.wmflabs:
deployment-zotero01.deployment-prep.eqiad.wmflabs:
deployment-elastic08.deployment-prep.eqiad.wmflabs:
deployment-videoscaler01.deployment-prep.eqiad.wmflabs:
deployment-memc02.deployment-prep.eqiad.wmflabs:
deployment-apertium01.deployment-prep.eqiad.wmflabs:
deployment-salt.deployment-prep.eqiad.wmflabs:
deployment-zookeeper01.deployment-prep.eqiad.wmflabs:
deployment-mx.deployment-prep.eqiad.wmflabs:
deployment-eventlogging02.deployment-prep.eqiad.wmflabs:
deployment-elastic07.deployment-prep.eqiad.wmflabs:
deployment-upload.deployment-prep.eqiad.wmflabs:
deployment-cxserver03.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-cache-mobile03.deployment-prep.eqiad.wmflabs:
deployment-logstash1.deployment-prep.eqiad.wmflabs:
deployment-sentry2.deployment-prep.eqiad.wmflabs:
deployment-redis01.deployment-prep.eqiad.wmflabs:
    labstore.svc.eqiad.wmnet:/project/deployment-prep/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc	0	0
deployment-urldownloader.deployment-prep.eqiad.wmflabs:
deployment-pdf01.deployment-prep.eqiad.wmflabs:
deployment-restbase02.deployment-prep.eqiad.wmflabs:
deployment-elastic06.deployment-prep.eqiad.wmflabs:
deployment-mathoid.deployment-prep.eqiad.wmflabs:
deployment-cache-text02.deployment-prep.eqiad.wmflabs:
root@deployment-salt:~#

I have cleaned up in /etc/fstab the #labstore... lines with:
salt '*' cmd.run "sed -i '/^#labstore/d' /etc/fstab"

Manually cleaned the /home entry on:

  • deployment-sca02 - puppet lock file from June 5th
  • deployment-redis02
  • deployment-test
  • deployment-cxserver03
  • deployment-redis01
  • deployment-elastic05
  • deployment-cache-bits01
  • deployment-db2
  • deployment-mediawiki03
  • deployment-bastion

labstore is no more referenced in /etc/fstab beside /data/project: salt '*' cmd.run 'grep labstore /etc/fstab|grep -v /data/project'

Nothing left mounted: salt '*' cmd.run 'mount |grep /home'

hashar triaged this task as High priority.
hashar moved this task from In-progress to Done on the Beta-Cluster-Infrastructure board.