Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Unknown Object (Task) | |||||
Resolved | wiki_willy | T245161 Track down and replace very old HW | |||
Resolved | Dzahn | T247780 decom old appservers in eqiad | |||
Resolved | • Cmjohnson | T253856 decom 36 old appservers in eqiad (onsite, dcops) |
Event Timeline
Change 580101 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/conftool: remove mw1221 through mw1226
Change 580105 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: remove mw1221 through mw1226
Change 580107 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove production IPs of mw1221 through mw1226
Mentioned in SAL (#wikimedia-operations) [2020-03-16T20:04:38Z] <mutante> depool (yes->no) mw1221 - mw1226 (T247780)
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw1221.eqiad.wmnet
- mw1221.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1222-1226].eqiad.wmnet
- mw1222.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1223.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1224.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Failed to wipe bootloaders, manual intervention required to make it unbootable: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1225.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1226.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
ERROR: some step on some host failed, check the bolded items above
Change 580101 merged by Dzahn:
[operations/puppet@production] site/conftool: remove mw1221 through mw1226
Change 580105 merged by Dzahn:
[operations/puppet@production] DHCP: remove mw1221 through mw1226
Change 580384 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/conftool: remove mw1238 through mw1243
Change 580384 merged by Dzahn:
[operations/puppet@production] site/conftool: remove mw1238 through mw1243
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1238-1239].eqiad.wmnet
- mw1238.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1239.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Mentioned in SAL (#wikimedia-operations) [2020-03-17T18:39:32Z] <mutante> removing mw1238 through mw1243 - decom with cookbook (T247780 T245099)
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1240-1243].eqiad.wmnet
- mw1240.eqiad.wmnet (FAIL)
- Host steps raised exception: Empty Management Password
- mw1241.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1242.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1243.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
ERROR: some step on some host failed, check the bolded items above
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw1240.eqiad.wmnet
- mw1240.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Change 580417 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: remove mw1238 through mw1243
Change 580107 merged by Dzahn:
[operations/dns@master] remove production IPs of mw1221 through mw1226
Change 580417 merged by Dzahn:
[operations/puppet@production] DHCP: remove mw1238 through mw1243
Change 580418 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove production IPs of mw1238 through mw1243
Change 580418 merged by Dzahn:
[operations/dns@master] remove production IPs of mw1238 through mw1243
Change 582160 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/conftool: decom mw1244-mw1249 and mw1227-mw1231
Icinga downtime for 2:00:00 set by dzahn@cumin1001 on 3 host(s) and their services with reason: decom
mw[1227-1229].eqiad.wmnet
Icinga downtime for 2:00:00 set by dzahn@cumin1001 on 2 host(s) and their services with reason: decom
mw[1230-1231].eqiad.wmnet
Icinga downtime for 2:00:00 set by dzahn@cumin1001 on 6 host(s) and their services with reason: decom
mw[1244-1249].eqiad.wmnet
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1227-1229].eqiad.wmnet
- mw1227.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1228.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1229.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1230-1231].eqiad.wmnet
- mw1230.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1231.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1244-1247].eqiad.wmnet
- mw1244.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1245.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1246.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1247.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1248-1249].eqiad.wmnet
- mw1248.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1249.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Change 582160 merged by Dzahn:
[operations/puppet@production] site/conftool: decom mw1244-mw1249 and mw1227-mw1231
Change 583114 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: decom mw125[0-3] and mw123[2-5]
Icinga downtime for 2:00:00 set by dzahn@cumin1001 on 4 host(s) and their services with reason: decom
mw[1232-1235].eqiad.wmnet
Icinga downtime for 2:00:00 set by dzahn@cumin1001 on 4 host(s) and their services with reason: decom
mw[1250-1253].eqiad.wmnet
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1232-1235].eqiad.wmnet
- mw1232.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1233.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1234.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1235.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1250-1253].eqiad.wmnet
- mw1250.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1251.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1252.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1253.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Change 583114 merged by Dzahn:
[operations/puppet@production] site: decom mw125[0-3] and mw123[2-5]
Change 583313 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: remove decom'ed appservers from rack D5
Change 583313 merged by Dzahn:
[operations/puppet@production] DHCP: remove decom'ed appservers from rack D5
Change 583377 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove IPs of recently decom'ed appservers in eqiad D5
Change 583575 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] decom mw1254 through mw1258, remaining rack D5 appservers
@Jclark-ctr @Cmjohnson @wiki_willy We (serviceops) are aware that currently there won't be onsite work except for emergencies. Additionally we also wanted to clarify that in this case of the old appservers we also _do not actually want them to be deracked yet_. So please do nothing here for now and all is good. Thanks!
Setting to stalled. We are waiting at least until Monday before removing the remaining 5 servers in rack D5.
Thanks for the heads up @Dzahn . @Jclark-ctr has been working on some of the other decom tasks this past week, but as long as this one doesn't show up on the eqiad workboard (project tagged with ops-eqiad), we should be fine. Also, currently the team is still available onsite approximately 4-8x per month...but it is definitely more limited now with frequency, due to various restrictions Equinix has put in place. Thanks, Willy
Icinga downtime for 2:00:00 set by dzahn@cumin1001 on 5 host(s) and their services with reason: decom
mw[1254-1258].eqiad.wmnet
Icinga downtime for 1 day, 0:00:00 set by dzahn@cumin1001 on 5 host(s) and their services with reason: decom
mw[1254-1258].eqiad.wmnet
Mentioned in SAL (#wikimedia-operations) [2020-03-31T15:35:34Z] <mutante> decom mw1254 through mw1258 (last remaining old servers in rack D5, depooled a while ago and average response time is again under 200ms) T247780
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw[1254-1258].eqiad.wmnet
- mw1254.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1255.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1256.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1257.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- mw1258.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Change 583575 merged by Dzahn:
[operations/puppet@production] decom mw1254 through mw1258, remaining rack D5 appservers
All mw servers in the rack D5 are now decom'ed. There are a few non-mw servers in that rack that were unaffected but besides those D5 is mostly deactivated now.
Change 585185 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: remove mw1254-mw1258
36 servers have been decom'ed. 30 in D5 and 6 in D4
But the original procurement ticket https://rt.wikimedia.org/Ticket/Display.html?id=8786 and installation ticket https://rt.wikimedia.org/Ticket/Display.html?id=8862 claim there were 38 servers.
I wonder where are the 2 missing ones? Are they maybe thumbor1003 and thumbor1004 in rack D5 and have been renamed from mw servers?
The installation ticket above said "32 of the 38 servers have been racked in D5. The remaining 6 will go somewhere else. Most likely D2 while unconventional row D has 3 10G racks which don't allow for 2 apache racks." but then went to 'resolved' without further comment.
Yes, it's thumbor1003 and thumbor1004, they are from the same procurement RT ticket.
36 of the 38 old servers from RT8786 have been decom'ed and the 2 thumbor servers are separate in T216815 or T233196.
There are a total of 187 mw1* servers. Of those 151 are in state "active" and 36 are in state "decommissioning".
All the ones from RT8786 are decom'ed.. This completes the ticket.
Change 585185 merged by Dzahn:
[operations/puppet@production] DHCP: remove mw1254-mw1258
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw1253.eqiad.wmnet
- mw1253.eqiad.wmnet (FAIL)
- Host steps raised exception: Empty Management Password
ERROR: some step on some host failed, check the bolded items above
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mw1253.eqiad.wmnet
- mw1253.eqiad.wmnet (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Found physical host
- Skipped downtime management interface on Icinga (likely already removed)
- Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
ERROR: some step on some host failed, check the bolded items above
Change 583377 merged by Dzahn:
[operations/dns@master] remove IPs of recently decom'ed appservers in eqiad D5
Change 595876 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove mw1254 - mw1258, they have been decom'ed
Change 595876 merged by Dzahn:
[operations/dns@master] remove mw1254 - mw1258, they have been decom'ed