Move cloud-vps Maps nfs share to vm-hosted NFS
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Andrew
	Feb 2 2022, 1:43 AM

Description

I am in the process of moving the Maps NFS storage off of a bare-metal NFS server and onto a virtualized NFS server.

Details

	Subject	Repo	Branch	Lines +/-
	wmcs 'maps' project: use project-local NFS server	operations/puppet	production	+1 -1
	cinder backups: include the maps nfs volume	operations/puppet	production	+5 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T272395 Cloud: reduce NAT exceptions from cloud to production
Resolved	Andrew	T291405 [NFS] Reduce or eliminate bare-metal NFS servers
Resolved	Andrew	T301280 Move project-specific NFS mounts onto project-local NFS servers
Resolved	Andrew	T300694 Move cloud-vps Maps nfs share to vm-hosted NFS

Event Timeline

Andrew created this task.Feb 2 2022, 1:43 AM

@TheDJ or @dschwen, I'm about to start a massive file copy of everything that is currently stored in the Maps NFS share, both $home and /data/project. It would thrill me to hear that some of those files are either transient and can be regenerated on the fly, or are obsolete and can be removed once and for all.

I'd also appreciate any projections you can make about spage usage -- At the moment that share has a max size of 8Tb, with an actual usage of 5.5Tb.

Thank you!

(This is not at all related to https://phabricator.wikimedia.org/T300160)

@Andrew while most tiles could be regenerated if the service does come back, not ALL can be (specifically the hill shading directory cannot be regenerated and should NOT be deleted), and it will cause a request storm if we ever attempt to bring it back.

Andrew, everything i have on NFS is not transient and must be copied. My transient stuff is now on a cinder volume.

Thanks all! I'm doing an epic rsync now which will take a couple of days to complete. When it comes time for the switch-over, do we need to stop services for a final rsync or is OK if the NFS share just reverts to its state from a few days previous?

Option 3 would be doing a second, final rsync without stopping services. That should be fine for my stuff at least.

yep, I'll definitely do a follow-up rsync but there will still be a lag because there are so many files to traverse.

I'm doing the final rsync today. I have no idea how long it will take; the original sync took several days but I'm hopeful this one will finish overnight.

Shall I just cut things over when it finishes, or do we need to coordinate?

Form my end you can just cut over. Would it be possible to keep the old die around for a few days in case something hit missed?

Yep, the old data is on a metal server so it'll be around until we switch off or reclaim the server :)

Change 760703 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cinder backups: include the maps nfs volume

https://gerrit.wikimedia.org/r/760703

Change 760703 merged by Andrew Bogott:

[operations/puppet@production] cinder backups: include the maps nfs volume

https://gerrit.wikimedia.org/r/760703

Maintenance_bot removed a project: Patch-For-Review.Feb 8 2022, 5:10 AM

Andrew added a parent task: T301280: Move project-specific NFS mounts onto project-local NFS servers.Feb 9 2022, 6:11 PM

Change 761430 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs 'maps' project: use project-local NFS server

https://gerrit.wikimedia.org/r/761430

gerritbot added a project: Patch-For-Review.Feb 9 2022, 6:16 PM

Change 761430 merged by Andrew Bogott:

[operations/puppet@production] wmcs 'maps' project: use project-local NFS server

https://gerrit.wikimedia.org/r/761430

I've now moved your nfs service to the 'maps-nfs-1' instance in your project. Please let me know if you see any bad results.

That server can be scaled up and down as needed, so if you're getting worse performance than before, I'm interested in adjusting and experimenting.

Maintenance_bot removed a project: Patch-For-Review.Feb 9 2022, 7:10 PM

Do I need to reboot my running instance to get this change? I thin the maps-wma2 instance is still connected to the old NFS

nfs-maps.wikimedia.org:/srv/maps on /mnt/nfs/secondary-maps type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.5.74,local_lock=none,addr=208.80.155.119)

versus

maps-nfs.svc.maps.eqiad1.wikimedia.cloud:/srv/maps on /mnt/nfs/secondary-maps type nfs4 (rw,noatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.1.203,local_lock=none,addr=172.16.4.216)

on a rebooted instance.

This would mean that I wrote data to the old NFS after your last rsync. :-((((

actually, maybe no data needs to be synced. I'll just remount the device.

Update: that fixed it

Andrew mentioned this in T322755: shut down cloud-vps 'maps' project.Nov 9 2022, 3:15 PM

bd808 mentioned this in T350259: Check if nfs-maps.wikimedia.org is still in use.Nov 1 2023, 4:50 PM

Move cloud-vps Maps nfs share to vm-hosted NFSClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Move cloud-vps Maps nfs share to vm-hosted NFS
Closed, ResolvedPublic
Actions

Related Objects
Search...