Page MenuHomePhabricator

move nfs /scratch to labstore1003
Closed, ResolvedPublic

Description

We need to do a lot of shuffling to get labstore1004 and labstore1005 to their final state. The scratch partition does not need any of the backup or service guarantees as other shares. Labstore1003 is fairly well resourced and mostly idle.

Currently:

/dev/mapper/srv-dumps             ext4       44T   16T   28T  37% /srv/dumps

Proposed:

/dev/mapper/srv-dumps    ext4       28T   16T   12T 

/dev/mapper/scratch      ext4       5T

9T unallocated for future use in either case.

labstore1003 pvs:

/dev/sda5  labstore1003-vg lvm2 a--   1.82t    0
/dev/sdc   srv             lvm2 a--  14.55t    0
/dev/sdd   srv             lvm2 a--  14.55t    0
/dev/sde   srv             lvm2 a--  14.55t    0

I propose allocating /dev/sde to misc workloads (mainly scratch at this point). Dumps is still fine for space and performance, and we split out the underlying disks for isolation.

Event Timeline

The proposed size of the filesystem for dumps looks fine to me.

+1 <3.

Will we move the content over? There were no guarantees of such, but it might be a nice gesture. No need for it to be complete or consistent.

Should also remember to soft mount these.

I have really no opinion on moving the data over, other than it costs us in maint time obv. We don't have a real strategy on /scratch cleanup so it's all adhoc reasoning.

yeah. I'd like for us to just do a simple rsync if possible. If we decide to not do that, we should provide people notice as well. I know that the kiwix project for example is using it for a lot of temp storage that can be recreated but costs them time and effort. If it'll be less effort for us to copy it over we should do it.

The main thing would be I would want to snapshot the volume, then copy over and then swap out to the new volume as gracefully as possible. Which in some cases is not graceful at all. That means a period where /scratch is there but new writes during this (probably small) period won't come over. That's the most sane situation I can think of at the moment. Thoughts?

I think that's good enough if we pre-announce it early enough.

This will also involve a period of dumps (via NFS) being offline. I think this is mostly a small issue though, anecdotally I don't see too many consumers this morning. But I can't do the shuffle online it seems.

Mentioned in SAL [2016-05-18T13:29:02Z] <chasemp> resize volume for nfs dumps per T134896

Mentioned in SAL [2016-05-18T13:57:53Z] <chasemp> downtime for dataset1001 puppet runs as T134896 causes failure (temporary for resize)

Change 305657 had a related patch set uploaded (by Rush):
tools: mount scratch on labstore1003 as well

https://gerrit.wikimedia.org/r/305657

@madhuvishy has been formalizing our logic for depooling/pooling grid exec nodes and so with T140483 resolved we hope to roll this out without reboots.

0. stage change in puppet, diable puppet across tools

  1. depool an exec and drain it
  2. run puppet changing scratch logistics
  3. test
  4. repool and next

With k8s I'm not sure how we should do this.

What is the graceful way to change the path for an NFS mount?

Change 306019 had a related patch set uploaded (by Madhuvishy):
nfs: Mount scratch from labstore1001 on different mount path

https://gerrit.wikimedia.org/r/306019

Change 306025 had a related patch set uploaded (by Madhuvishy):
tools: Add script that helps manage sge exec nodes

https://gerrit.wikimedia.org/r/306025

You can do the same for k8s. You can depool https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Depooling_a_node, do your thing, repool. That will work for all the k8s worker nodes.

the etcd nodes don't have any NFS, but the k8s master does. I'd like to be around when we do the master tho (HA setup not completed yet)

Yeah I'm familiar with doing this for k8s worker nodes - did this a bunch of times while helping @yuvipanda recreate worker nodes couple weeks ago.

Mentioned in SAL [2016-08-22T22:07:09Z] <madhuvishy> Disabled puppet across tools hosts in preparation to merge https://gerrit.wikimedia.org/r/#/c/305657/ (see T134896)

Change 305657 merged by Madhuvishy:
tools: mount scratch on labstore1003 as well

https://gerrit.wikimedia.org/r/305657

Mentioned in SAL [2016-08-23T07:08:51Z] <madhuvishy> Enabled puppet across tools after merging https://gerrit.wikimedia.org/r/#/c/305657/ (see T134896)

Change 306025 merged by Rush:
tools: Add script that helps manage sge exec nodes

https://gerrit.wikimedia.org/r/306025

Change 307337 had a related patch set uploaded (by Madhuvishy):
toollabs: Remove puppet dependencies on git clone cdnjs

https://gerrit.wikimedia.org/r/307337

Change 307337 merged by Madhuvishy:
toollabs: Remove puppet dependencies on git clone cdnjs

https://gerrit.wikimedia.org/r/307337

Change 307343 had a related patch set uploaded (by Madhuvishy):
toollabs: Set timeout 0 on cdnjs git clone exec

https://gerrit.wikimedia.org/r/307343

Change 307343 merged by Madhuvishy:
toollabs: Set timeout 0 on cdnjs git clone exec

https://gerrit.wikimedia.org/r/307343

In order to backup scratch from labstore1001 to labstore1003 using rsync:

snapshot

lvcreate -L1T -s -n backup-scratch /dev/labstore/scratch

mount

mount /dev/labstore/backup-scratch /srv/backup-scratch

ionice and rsync

ionice -c 3 \
rsync --rsh 'ssh -i /root/.ssh/id_labstore' \
  --archive \
  --compress \
  --progress \
  --hard-links \
  --delete \
  --bwlimit=6000 \
  --exclude=mwoffliner/ \
  --exclude=tmp/ \
  --dry-run \
  /srv/backup-scratch/ \
  root@labstore1003.eqiad.wmnet:/srv/scratch/

Change 306019 merged by Madhuvishy:
nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003

https://gerrit.wikimedia.org/r/306019

Mentioned in SAL [2016-08-31T18:18:44Z] <madhuvishy> Scratch migration complete for all k8s workers (T134896)

Mentioned in SAL [2016-08-31T19:36:19Z] <madhuvishy> Scratch migration on all non exec/worker nodes complete (T134896)

Mentioned in SAL [2016-08-31T20:45:35Z] <madhuvishy> Scratch migration complete on all grid exec nodes (T134896)