Decommission task for old cp hosts (cp1075-1090)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Fabfur
	Nov 29 2023, 10:16 AM

Details

	Subject	Repo	Branch	Lines +/-
	decom cp1075-1090	operations/puppet	production	+6 -79

Customize query in gerrit

Related Objects

Mentioned In: T358727: Reclaim recently-decommed CP host for WDQS (see T352253)
T349244: Q1:Install cp11[00-15] and rotate into production
Mentioned Here: T349244: Q1:Install cp11[00-15] and rotate into production

Event Timeline

Fabfur created this task.Nov 29 2023, 10:16 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 29 2023, 10:16 AM

Change 977702 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] decom cp1075-1090

https://gerrit.wikimedia.org/r/977702

gerritbot added a project: Patch-For-Review.Nov 29 2023, 10:18 AM

Change 977702 merged by Fabfur:

[operations/puppet@production] decom cp1075-1090

https://gerrit.wikimedia.org/r/977702

Maintenance_bot removed a project: Patch-For-Review.Nov 29 2023, 10:30 AM

Mentioned in SAL (#wikimedia-operations) [2023-11-29T10:36:32Z] <fabfur> decommissioning cp1075-1090 (T352253)

Fabfur added projects: DC-Ops, ops-eqiad.Nov 29 2023, 11:21 AM

Fabfur updated the task description. (Show Details)

Maintenance_bot added a project: SRE.Nov 29 2023, 11:29 AM

cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: cp[1075-1090].eqiad.wmnet

cp1075.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1076.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1077.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1078.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1079.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1080.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1081.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1082.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1083.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1084.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1085.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1086.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1087.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1088.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1089.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cp1090.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Fabfur mentioned this in T349244: Q1:Install cp11[00-15] and rotate into production.Nov 29 2023, 12:08 PM

KOfori moved this task from Backlog to Radar/Not for service by Traffic on the Traffic board.Dec 4 2023, 1:52 PM

VRiley-WMF updated the task description. (Show Details)Dec 13 2023, 5:58 PM

VRiley-WMF updated the task description. (Show Details)

VRiley-WMF closed this task as Resolved.Dec 13 2023, 8:15 PM

VRiley-WMF claimed this task.

VRiley-WMF updated the task description. (Show Details)

Hi dc-ops team: quick question: have these hosts already been hardware decommissioned?

For further context: we have a request from @dr0ptp4kt for running a Blazegraph experiment and we are trying to free up a cp node for him. So we were wondering if this hardware has still not been hardware decomissioned, we can just bring up a host here.

Hi @ssingh - the hardware should still be around, and we should be able to reallocate one of them for testing purposes. Can you shoot open a new Phabricator for us with all the necessary details (hostname, racking info, network setup, raid/partitioning, OS, and main poc)? Also, do you know how long Adam would need it for?

Thanks,
Willy

After setup, I would be interested in using it for 6 weeks if that's okay (hopefully things would only take 4 weeks, but there's some PTO and real life stuff always comes up). Would that be okay?

We're presently running Debian 11 with backported Java 8 from our APT on the wdqsNNNN hosts, so for simplicity that should be the target OS.

What we're attempting to do is validate the performance effects of running with a data center quality high speed physically attached NVMe disk for the case of needing to repopulate Blazegraph. We have some promising indicators from my Alienware i7-8700 (I had installed a consumer M.2 NVMe) and from physically attached NVMes in AWS (but there's still a level of virtualization abstraction), but we're hoping to see real world bare metal as we plan for next FY's server refreshes for a number of WDQS nodes.

Looking at {T193911} I suspect we may want to see if it's possible to bundle a couple of NVMes and a couple of SATA SSDs onto a host if that's possible so that we can verify both non-RAIDed and RAIDed performance. The dump ingestion process we want to test can occupy up to 3 TB in source file(s) (this would sit on the SATA SSDs) and and up to 1.3 TB for the destination file (this would sit on the NVMe SSDs).

I don't know if it's possible to move NVMes to another wdqsNNNN host, but that may be a good idea so that we can have an apples-to-apples contrast with runs on similarly spec'd CPUs. Technically, it would be possible to run ingestion processes with SATA SSD1 -> SATA SSD2 and then SATA SSD1 -> NVMe(s) to compare the differences as we already know that clock speed is also a factor in performance here. I'll be discussing a bit more with @bking and @RKemper tomorrow and hopefully we can all close the loop soon.

Thanks!

@bking , @RKemper , and I met today. @bking has an action on this here ticket (@bking LMK in case I need to chime in on anything!). Thanks!

@wiki_willy I'm going to take over this work from @dr0ptp4kt . l'll make a phab task with the data you requested shortly.

Sounds good @bking, thanks!

In T352253#9585422, @bking wrote:

@wiki_willy I'm going to take over this work from @dr0ptp4kt . l'll make a phab task with the data you requested shortly.

bking mentioned this in T358727: Reclaim recently-decommed CP host for WDQS (see T352253).Feb 28 2024, 9:58 PM

Decommission task for old cp hosts (cp1075-1090)Closed, ResolvedPublicActions

Description

cp1075

cp1076

cp1077

cp1078

cp1079

cp1080

cp1081

cp1082

cp1083

cp1084

cp1085

cp1086

cp1087

cp1088

cp1089

cp1090

Details

Related Objects

Event Timeline

Decommission task for old cp hosts (cp1075-1090)
Closed, ResolvedPublic
Actions