Page MenuHomePhabricator

restbase cluster: decommission end-of-life hosts
Closed, ResolvedPublic

Description

Decommission (4) end-of-life hosts:

  • restbase1016
  • restbase1017
  • restbase1018
  • restbase2012

See: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Active_-%3E_Decommissioned


This Cassandra cluster has 6 hosts in all 3 rows of eqiad, 6 hosts in codfw row d, but only 5 in codfw rows b & c. Perhaps we failed to provision some additional nodes after the decommission of restbase2010 & restbase2011?

Until/unless this is remedied, the effective capacity of the cluster is limited to a multiple of 5 (making the capacity of 4 of these hosts unusable).

Event Timeline

Eevans renamed this task from restbase cluster: missing hosts in rows b & c? to restbase cluster: missing hosts in codfw rows b & c?.Jan 31 2023, 8:09 PM
Eevans triaged this task as Medium priority.

@hnowlan Do you have any insight on this? Is there hardware somewhere that didn't get provisioned, or hardware that was decommissioned without replacements?

I believe it's not that B and C have too few, it's that D has too many - restbase2012 is also out of warranty and should be decommissioned. Perhaps this is the source of this discrepancy? restbase101[678] are also of the same purchase date.

I believe it's not that B and C have too few, it's that D has too many - restbase2012 is also out of warranty and should be decommissioned. Perhaps this is the source of this discrepancy? restbase101[678] are also of the same purchase date.

So we provisioned 6 new hosts as part of a refresh, but only decommissioned 2?

It seems that way yeah... there was some inconsistency to the process I realise: T294377 / T291991#7657098

hostrowrackpurchase date
restbase2013codfw row BB5DellPowerEdge R4402018-11-14
restbase2014codfw row BB8DellPowerEdge R4402018-11-14
restbase2019codfw row BB5DellPowerEdge R4402019-02-28
restbase2021codfw row BB3DellPowerEdge R4402019-12-26
restbase2024codfw row BB6DellPowerEdge R4402021-11-18
restbase2015codfw row CC1DellPowerEdge R4402018-11-14
restbase2016codfw row CC5DellPowerEdge R4402018-11-14
restbase2020codfw row CC5DellPowerEdge R4402019-02-28
restbase2022codfw row CC1DellPowerEdge R4402019-12-26
restbase2025codfw row CC5DellPowerEdge R4402021-11-18
restbase2012codfw row DD1DellPowerEdge R4302016-11-04
restbase2017codfw row DD1DellPowerEdge R4402018-11-14
restbase2018codfw row DD5DellPowerEdge R4402018-11-14
restbase2023codfw row DD8DellPowerEdge R4402019-12-26
restbase2026codfw row DD5DellPowerEdge R4402021-11-18
restbase2027codfw row DD5DellPowerEdge R4402021-02-26
restbase1016eqiad row AA3DellPowerEdge R4302016-11-02
restbase1019eqiad row AA3DellPowerEdge R4402019-03-01
restbase1020eqiad row AA5DellPowerEdge R4402019-03-01
restbase1021eqiad row AA6DellPowerEdge R4402019-03-01
restbase1028eqiad row AA5DellPowerEdge R4402020-01-17
restbase1031eqiad row AA6DellPowerEdge R440 - Restbase Config 2021072021-11-25
restbase1017eqiad row BB5DellPowerEdge R4302016-11-02
restbase1022eqiad row BB3DellPowerEdge R4402019-03-01
restbase1023eqiad row BB5DellPowerEdge R4402019-03-01
restbase1024eqiad row BB8DellPowerEdge R4402019-03-01
restbase1029eqiad row BB5DellPowerEdge R4402020-01-17
restbase1032eqiad row BB3DellPowerEdge R440 - Restbase Config 2021072021-11-25
restbase1025eqiad row DD3DellPowerEdge R4402019-03-01
restbase1026eqiad row DD3DellPowerEdge R4402019-03-01
restbase1027eqiad row DD6DellPowerEdge R4402019-03-01
restbase1030eqiad row DD4DellPowerEdge R4402020-01-17
restbase1033eqiad row DD1DellPowerEdge R440 - Restbase Config 2021072021-11-25
restbase1018eqiad row DD3DellPowerEdige R4402016-11-02
Eevans renamed this task from restbase cluster: missing hosts in codfw rows b & c? to restbase cluster: decommission end-of-life hosts.Sep 29 2023, 3:29 PM
Eevans updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2023-10-11T13:24:57Z] <urandom> starting decommission of restbase2012-a — T328490

Change 965174 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Move restbase canary

https://gerrit.wikimedia.org/r/965174

Change 965174 merged by Muehlenhoff:

[operations/puppet@production] Move restbase canary

https://gerrit.wikimedia.org/r/965174

Mentioned in SAL (#wikimedia-operations) [2023-10-13T11:53:51Z] <urandom> starting decommission of restbase2012-c — T328490

Mentioned in SAL (#wikimedia-operations) [2023-10-14T18:30:58Z] <urandom> starting Cassandra decommission of restbase1016-a — T328490

Mentioned in SAL (#wikimedia-operations) [2023-10-15T19:10:42Z] <urandom> starting Cassandra decommission of restbase1016-b — T328490

Mentioned in SAL (#wikimedia-operations) [2023-10-19T17:33:49Z] <urandom> Decommissioning Cassandra, restbase1018-{a,b,c} — T328490

Eevans claimed this task.
Eevans updated the task description. (Show Details)

macro-deployed

Change 968751 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] site.pp: cleanup decommissioned restbase hosts

https://gerrit.wikimedia.org/r/968751

Change 968751 merged by Eevans:

[operations/puppet@production] site.pp: cleanup decommissioned restbase hosts

https://gerrit.wikimedia.org/r/968751