Page MenuHomePhabricator

Move cloud-vps Maps nfs share to vm-hosted NFS
Closed, ResolvedPublic

Description

I am in the process of moving the Maps NFS storage off of a bare-metal NFS server and onto a virtualized NFS server.

Event Timeline

@TheDJ or @dschwen, I'm about to start a massive file copy of everything that is currently stored in the Maps NFS share, both $home and /data/project. It would thrill me to hear that some of those files are either transient and can be regenerated on the fly, or are obsolete and can be removed once and for all.

I'd also appreciate any projections you can make about spage usage -- At the moment that share has a max size of 8Tb, with an actual usage of 5.5Tb.

Thank you!

(This is not at all related to https://phabricator.wikimedia.org/T300160)

@Andrew while most tiles could be regenerated if the service does come back, not ALL can be (specifically the hill shading directory cannot be regenerated and should NOT be deleted), and it will cause a request storm if we ever attempt to bring it back.

Andrew, everything i have on NFS is not transient and must be copied. My transient stuff is now on a cinder volume.

Thanks all! I'm doing an epic rsync now which will take a couple of days to complete. When it comes time for the switch-over, do we need to stop services for a final rsync or is OK if the NFS share just reverts to its state from a few days previous?

Option 3 would be doing a second, final rsync without stopping services. That should be fine for my stuff at least.

yep, I'll definitely do a follow-up rsync but there will still be a lag because there are so many files to traverse.

I'm doing the final rsync today. I have no idea how long it will take; the original sync took several days but I'm hopeful this one will finish overnight.

Shall I just cut things over when it finishes, or do we need to coordinate?

Form my end you can just cut over. Would it be possible to keep the old die around for a few days in case something hit missed?

Yep, the old data is on a metal server so it'll be around until we switch off or reclaim the server :)

Change 760703 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cinder backups: include the maps nfs volume

https://gerrit.wikimedia.org/r/760703

Change 760703 merged by Andrew Bogott:

[operations/puppet@production] cinder backups: include the maps nfs volume

https://gerrit.wikimedia.org/r/760703

Change 761430 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs 'maps' project: use project-local NFS server

https://gerrit.wikimedia.org/r/761430

Change 761430 merged by Andrew Bogott:

[operations/puppet@production] wmcs 'maps' project: use project-local NFS server

https://gerrit.wikimedia.org/r/761430

I've now moved your nfs service to the 'maps-nfs-1' instance in your project. Please let me know if you see any bad results.

That server can be scaled up and down as needed, so if you're getting worse performance than before, I'm interested in adjusting and experimenting.

Do I need to reboot my running instance to get this change? I thin the maps-wma2 instance is still connected to the old NFS

nfs-maps.wikimedia.org:/srv/maps on /mnt/nfs/secondary-maps type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.5.74,local_lock=none,addr=208.80.155.119)

versus

maps-nfs.svc.maps.eqiad1.wikimedia.cloud:/srv/maps on /mnt/nfs/secondary-maps type nfs4 (rw,noatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.1.203,local_lock=none,addr=172.16.4.216)

on a rebooted instance.

This would mean that I wrote data to the old NFS after your last rsync. :-((((

actually, maybe no data needs to be synced. I'll just remount the device.

Update: that fixed it