Page MenuHomePhabricator

Disable /data/project for instances in deployment-prep that do not need it
Closed, ResolvedPublic

Description

We're trying to reduce the total number of NFS clients, and unmounting /data/project from all hosts except the ones that absolutely need it (which would be the mw* hosts and the varnishes, perhaps?) would help.

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added a subscriber: yuvipanda.

So:

  1. deployment-mediawiki01/02
  2. deployment-tmh01
  3. deployment-upload
  4. deployment-cache-upload04

These seem to be the only ones that need it. I'll await confirmation from someone in the releng team and then disable NFS on the rest.

Logging server? I think we write those syslogs to /data/project

I am pretty sure we killed that when we found out.

Last time I checked, a various backend services wrote their logs to /data/project so it can be read from anywhere. Though that might just be parsoid:

manifests/role/parsoid.pp:    $parsoid_log_file = '/data/project/parsoid/parsoid.log'

To be checked with Parsoid folks. Maybe it now runs on the shared instances nowadays and relay everything to syslog and logstash.


All instances having MediaWiki (app server, job runners etc) do hit NFS to grab images/thumbnails. I have lost track of which instances are running jobs though. Just from operations/mediawiki-config.git:

wmf-config/CommonSettings-labs.php:     $wgCaptchaDirectory = '/data/project/upload7/private/captcha/random';
wmf-config/CommonSettings-labs.php:     $wgCaptchaDirectory = '/data/project/upload7/private/captcha';
wmf-config/CommonSettings-labs.php:     $wgMathDirectory   = '/data/project/upload7/math';
wmf-config/CommonSettings-labs.php:     $wgScoreDirectory = '/data/project/upload7/score';
wmf-config/InitialiseSettings-labs.php:                 'default'      => '/data/project/upload7/$site/$lang',
wmf-config/InitialiseSettings-labs.php:                 'private'      => '/data/project/upload7/private/$lang',
wmf-config/filebackend-labs.php:                'deletedDir' => "/data/project/upload7/private/archive/$site/$lang",
wmf-config/filebackend-labs.php:                'directory'        => '/data/project/upload7/wikipedia/commons',
wmf-config/filebackend-labs.php:        'basePath'       => "/data/project/upload7/private/gwtoolset/$site/$lang"

All that crap would be gone once we have Swift on beta (T64835) that can be removed and deployment-upload nuke.


So in summary, to my knowledge instances still having NFS actually require it (beside Parsoid).

On the NFS server, is it possible to have a breakdown of per instances NFS hits for deployment-prep project? Maybe that would identify other instances that hit it and do not strictly needs NFS.

If anything is writing logs to NFS we must make sure it stops as soon as
possible - I'll check with the parsoid people :)

ALL instances have NFS now - zotero, restbase, urldownloader, bastion, etc

  • and most of them don't need it. Outside of parsoid, all the MW hosts you

mentioned I covered in my list I think (except jobrunners, which I'll add).
Once I verify with the parsoid team, I'll get rid of NFS From the non MW
instances.

hashar set Security to None.

Listed the NFS types via salt/df/magic:

root@deployment-salt:~ # salt -v --out=txt '*' cmd.run "df -t nfs 2>/dev/null|grep -v ^Filesystem|cut -d\  -f7"
Executing job with jid 20160204193722579006
-------------------------------------------

deployment-bastion.deployment-prep.eqiad.wmflabs: /data/project
deployment-db1.deployment-prep.eqiad.wmflabs: /data/project
deployment-db2.deployment-prep.eqiad.wmflabs: /data/project
deployment-fluorine.deployment-prep.eqiad.wmflabs: /data/project
deployment-kafka02.deployment-prep.eqiad.wmflabs: /data/project
deployment-memc02.deployment-prep.eqiad.wmflabs: /data/project
deployment-memc03.deployment-prep.eqiad.wmflabs: /data/project
deployment-memc04.deployment-prep.eqiad.wmflabs: /data/project
deployment-poolcounter01.deployment-prep.eqiad.wmflabs: /data/project
deployment-salt.deployment-prep.eqiad.wmflabs: /data/project
deployment-upload.deployment-prep.eqiad.wmflabs: /data/project

I have cleaned a few old logs files and obsolete ones from /data/project.

Salt is totally lying, because it's totally mounted on more instances than in that list :) from deployment-restbase01:

labstore.svc.eqiad.wmnet:/project/deployment-prep/project on /data/project type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.68.16.128,lookupcache=none,local_lock=none,addr=10.64.37.10)

The way to disable this would be to set mount_nfs hiera variable to false in Hiera:deployment-prep page on wikitech and turn it on only for the specific hosts it is needed in.

Logging server? I think we write those syslogs to /data/project

Logs should be going via rsyslog forwarding to deployment-fluorine and into the beta logstash server as well using basically the same setup as we use in production. Both use local instance storage.

This is unrelated to swift, since this is only removing the mounts from instances that do not use them at all and is effectively a no-op.

Am going to do this now.

I've set mount_nfs: true on all the instances I listed earlier.

These are all still no-ops, since I actually need to unmount them now :)

I've unmounted /data/project on deployment-poolcounter01 and verified that puppet doesn't bring it back. I'll unmount it on all the other instances shortly.

I've tried to unmount it from the following instances:

deployment-analytics03.eqiad.wmflabs
deployment-analytics02.eqiad.wmflabs
deployment-analytics01.eqiad.wmflabs
deployment-tin.eqiad.wmflabs
deployment-restbase01.eqiad.wmflabs
deployment-sca02.eqiad.wmflabs
deployment-sca01.eqiad.wmflabs
deployment-mathoid.eqiad.wmflabs
deployment-ms-be02.eqiad.wmflabs
deployment-ms-be01.eqiad.wmflabs
deployment-ms-fe01.eqiad.wmflabs
deployment-sentry01.eqiad.wmflabs
deployment-conftool.eqiad.wmflabs
deployment-kafka04.eqiad.wmflabs
deployment-aqs01.eqiad.wmflabs
deployment-eventlogging04.eqiad.wmflabs
deployment-cache-parsoid05.eqiad.wmflabs
deployment-conf03.eqiad.wmflabs
deployment-poolcounter01.eqiad.wmflabs
deployment-eventlogging03.eqiad.wmflabs
deployment-cache-mobile04.eqiad.wmflabs
deployment-cache-text04.eqiad.wmflabs
deployment-puppetmaster.eqiad.wmflabs
mira.eqiad.wmflabs
deployment-logstash2.eqiad.wmflabs
deployment-fluorine.eqiad.wmflabs
deployment-restbase02.eqiad.wmflabs
deployment-zookeeper01.eqiad.wmflabs
deployment-kafka02.eqiad.wmflabs
deployment-zotero01.eqiad.wmflabs
deployment-urldownloader.eqiad.wmflabs
deployment-elastic08.eqiad.wmflabs
deployment-elastic07.eqiad.wmflabs
deployment-elastic06.eqiad.wmflabs
deployment-elastic05.eqiad.wmflabs
deployment-parsoid05.eqiad.wmflabs
deployment-apertium01.eqiad.wmflabs
deployment-cxserver03.eqiad.wmflabs
deployment-mx.eqiad.wmflabs
deployment-redis02.eqiad.wmflabs
deployment-redis01.eqiad.wmflabs
deployment-pdf02.eqiad.wmflabs
deployment-sentry2.eqiad.wmflabs
deployment-pdf01.eqiad.wmflabs
deployment-stream.eqiad.wmflabs
deployment-db2.eqiad.wmflabs
deployment-memc04.eqiad.wmflabs
deployment-db1.eqiad.wmflabs
deployment-salt.eqiad.wmflabs
deployment-memc03.eqiad.wmflabs
deployment-memc02.eqiad.wmflabs
deployment-bastion.eqiad.wmflabs

Success except in:

deployment-parsoid05
deployment-pdf02
deployment-cache-text04
deployment-cache-mobile04
deployment-cache-parsoid05
deployment-ms-fe01
deployment-sca01

Everything except deployment-parsoid05 and deployment-sca01 has been handled. Parsoid seems to be writing logs to NFS >_>

Change 271183 had a related patch set uploaded (by Yuvipanda):
beta: Move parsoid logs off NFS

https://gerrit.wikimedia.org/r/271183

Change 271183 merged by Yuvipanda:
beta: Move parsoid logs off NFS

https://gerrit.wikimedia.org/r/271183

Disabled on deployment-parsoid05, logs are on /var/log/parsoid now.

And an umount -f does the trick in deployment-sca01!

So all instances that do not need NFS do not have NFS anymore! Woo! :D

If someone is building a new server that does need NFS for whatever reason, you can set the mount_nfs: True line in Hiera:deployment-prep/host/<hostname> and run puppet and it'll mount /data/project.

yuvipanda triaged this task as Medium priority.Feb 17 2016, 3:15 AM

Bah, since I didn't delete them from /etc/fstab they would be back when restarted, which happened to all instances. Will need to do this again.

Fixed it up properly this time!