The dumps workloads on labstore1006 and 7 can be moved to the new clouddumps hosts. The new hosts have lots of built-in storage so this move will let us decom the old servers and disk shelves used by 1006/1007
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Reedy | T281203 dumps distribution servers space issues | |||
Unknown Object (Task) | |||||
Resolved | Andrew | T302981 Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts | |||
Resolved | Andrew | T309346 Replace labstore100[67] with clouddumps100[12] | |||
Resolved | Andrew | T310451 hdfs client packages for debian Bullseye | |||
Resolved | BTullis | T310643 Build Bigtop 1.5 Hadoop packages for Bullseye | |||
Resolved | Andrew | T316123 Auth extremely slow on clouddumps100[12] | |||
Resolved | Andrew | T317144 toolforge/paws k8s containers need to know about clouddumps100[12] | |||
Resolved | rook | T317881 Remove labstore systems | |||
Resolved | Request | Andrew | T319217 decommission labstore100[67].wikimedia.org |
Event Timeline
@ArielGlenn these two new servers should be ready; I'm hoping that you have the time to move the data and workload over.
Currently each has a gigantic sdb1 lvm partition that isn't mounted anywhere. If you want me to mess with partman more and get these set up some different way I'm happy to give it a go, but I'm pretty sure that moving ahead with puppetized or by-hand lvm commands is the right next step.
Not sure who should get this next but it's not Hannah or I :-) I was never involved in the configuration and setup of the old labstore boxes, so I wouldn't have any idea how they need to be set up. I guess that was probably all Brooke's work.
When the servers are ready for rsync, you'll want to grab the data off of the old boxes, and then let me know and we can change a couple names and sync to them instead of the current labstore1006,7. It's just flipping a few variable names in puppet. But if we want to minimize extra syncing, the main rsync from the labstores should be run a few times for the bulk of the data and one final time on say the 17th of the month when the sql/xml dumps are complete, then I can swap our end over to point to those with almost no additional copying needed.
For the fetches from stats hosts and such, I guess those are separate settings in puppet that can just be swapped as well on your end, once the main rsyncs have gotten done.
Note that there's the open task about putting dumps.wikimedia.org behind our cdn T306550 related to this task.
This task does not require DC-OPs tag, once you have moved the data, please decommission labstores and create a task for DC-OPs,
Change 802600 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] put clouddumps100[12] into service
I think this should be assigned to me, to put the new hosts into service. That's currently blocked by a lot of partman nonsense that Papaul is helping me with; that's documented on T302981
Change 802600 merged by Andrew Bogott:
[operations/puppet@production] put clouddumps100[12] into service
Change 824513 had a related patch set uploaded (by Btullis; author: Btullis):
[labs/private@master] Add dummy keytabs for new clouddumps100* servers
Change 824513 merged by Btullis:
[labs/private@master] Add dummy keytabs for new clouddumps100* servers
Change 828102 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Dumps: switch to using clouddumps hosts rather than the old labstores.
Change 828103 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Dumps: stop mounting the old labstore100x servers on VMs
Change 830200 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/dns@master] Move dumps from labstore1006 to clouddumps1001
Change 828102 merged by Andrew Bogott:
[operations/puppet@production] Dumps: switch to using clouddumps hosts rather than the old labstores.
Change 830200 merged by Andrew Bogott:
[operations/dns@master] Move dumps from labstore1006 to clouddumps1001
Change 830246 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] openstack network tests: switch to check clouddumps1001 mounts
Change 830246 merged by Andrew Bogott:
[operations/puppet@production] openstack network tests: switch to check clouddumps1001 mounts
Change 835192 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Dumps: switch to using clouddumps hosts rather than the old labstores.
Change 835192 merged by Andrew Bogott:
[operations/puppet@production] Dumps: switch to using clouddumps hosts rather than the old labstores.
Change 828103 merged by Andrew Bogott:
[operations/puppet@production] Dumps: stop mounting the old labstore100x servers on VMs
This is now done. I'm going to gradually dismantle the old dumps servers but will probably leave their data intact for a few weeks before decom.
Thanks for all your work on this @Andrew.
I'm going to do a fleet-wide check to see if anything still references the labstore servers in /etc/fstab or if they're still mounted anywhere. I have a feeling that one or two servers might still have the mount active.
Also you might like to update this section when convenient: https://wikitech.wikimedia.org/wiki/Dumps/Dump_servers#Hardware
Note that labstore1006 has some html dumps that didn't make it around to the other boxes, so please don't reimage until I get that sorted. Thanks!
Given the new machines much larger capacity, I believe any pending requests for more space can now be reconsidered. @ArielGlenn do you know of any pending needs? Shall we discuss on a separate ticket any potential use cases for the extra space?
There's the Kiwix request to mirror more files; see T57503 which I believe you're aware of. I'm not sure of anything else at the moment.
Also you might like to update this section when convenient: https://wikitech.wikimedia.org/wiki/Dumps/Dump_servers#Hardware
- done
The directory /srv/deployment/analytics had incorrect ownership on the new hosts, so our deployment failed.
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery#Deploying_to_clouddumps*_hosts
I did the following manual fix on clouddumps100[1-2]
btullis@clouddumps1001:/srv/deployment/analytics$ ls -ld drwxr-xr-x 3 root root 4096 Aug 23 01:47 . btullis@clouddumps1001:/srv/deployment/analytics$ sudo chown analytics-deploy . btullis@clouddumps1001:/srv/deployment/analytics$ ls -ld drwxr-xr-x 3 analytics-deploy root 4096 Aug 23 01:47 . btullis@clouddumps1001:/srv/deployment/analytics$
I will follow up with a patch to configure this in puppet.
Change 849192 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Dumps: remove a bunch of references to labstore1006 and labstore1007
Change 849193 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] rsync-via-primary.sh: replace labstore with clouddumps
Change 849193 merged by Andrew Bogott:
[operations/puppet@production] rsync-via-primary.sh: replace labstore with clouddumps
Change 849192 merged by Andrew Bogott:
[operations/puppet@production] Dumps: remove a bunch of references to labstore1006 and labstore1007