Page MenuHomePhabricator

Refresh aqs1013 w/ aqs1022
Open, Needs TriagePublic

Description

aqs1022 is a refresh for aqs1010, but aqs1013 has a hardware issue that we've been unable to narrow down, so we're going to replace it instead.

  • Provision aqs1022
  • Decommission aqs1013

T372514: Q1:rack/setup/install aqs1022.eqiad.wmnet
T362033: Degraded RAID on aqs1013

Event Timeline

Change #1085430 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] aqs1022: Provision new Cassandra host

https://gerrit.wikimedia.org/r/1085430

Change #1085430 merged by Eevans:

[operations/puppet@production] aqs1022: Provision new Cassandra host

https://gerrit.wikimedia.org/r/1085430

Mentioned in SAL (#wikimedia-operations) [2024-10-31T16:52:28Z] <eevans@cumin1002> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1022.eqiad.wmnet with reason: Bootstrapping — T378725

Icinga downtime and Alertmanager silence (ID=29843e32-cfac-4760-a136-080d3abfc109) set by eevans@cumin1002 for 30 days, 0:00:00 on 1 host(s) and their services with reason: Bootstrapping — T378725

aqs1022.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-10-31T16:52:43Z] <eevans@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1022.eqiad.wmnet with reason: Bootstrapping — T378725

Mentioned in SAL (#wikimedia-operations) [2024-10-31T16:55:59Z] <urandom> Bootstrapping Cassandra/aqs1022-a — T378725

Mentioned in SAL (#wikimedia-operations) [2024-10-31T21:22:18Z] <urandom> Bootstrapping Cassandra/aqs1022-b — T378725

Mentioned in SAL (#wikimedia-operations) [2024-11-01T01:40:44Z] <eevans@cumin1002> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — T378725

Icinga downtime and Alertmanager silence (ID=c9d0e8d3-1673-4ae0-84fc-5db53d0d1dbd) set by eevans@cumin1002 for 30 days, 0:00:00 on 1 host(s) and their services with reason: Decommissioning — T378725

aqs1013.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-11-01T01:40:59Z] <eevans@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — T378725

Mentioned in SAL (#wikimedia-operations) [2024-11-01T01:42:55Z] <urandom> Decommissioning Cassandra/aqs1013-{a,b} — T378725