Hosts elastic2037-2054 have reached the end of their 5-year lifespan.
Creating this ticket to track their decommissioning.
Hosts elastic2037-2054 have reached the end of their 5-year lifespan.
Creating this ticket to track their decommissioning.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T353392 Ensure Elastic stack works on bookworm | |||
Resolved | bking | T353878 Service implementation for elastic2087-2109 | |||
Resolved | bking | T358882 Decommission elastic2037-2054 | |||
Resolved | Request | bking | T313842 Decommission elastic2049.codfw.wmnet | ||
Resolved | Request | Jhancock.wm | T361305 decommission elastic20[37-54].codfw.wmnet |
Change #1013398 had a related patch set uploaded (by Bking; author: Bking):
[operations/puppet@production] elastic-codfw: Add new master-eligibles
Change #1013398 merged by Bking:
[operations/puppet@production] elastic-codfw: Add new master-eligibles
Change #1013401 had a related patch set uploaded (by Bking; author: Bking):
[operations/puppet@production] elastic: move elastic2037 to insetup
Change #1013401 merged by Bking:
[operations/puppet@production] elastic: move elastic2037 to insetup
Mentioned in SAL (#wikimedia-operations) [2024-03-22T06:22:19Z] <ryankemper> T358882 Updating cross-cluster seeds to bring into concordance with newly added masters: ryankemper@mwmaint1002:~/elastic$ python push_cross_cluster_conf.py https://search.svc.codfw.wmnet:9643/_cluster/settings --ccc chi=chi_codfw_masters.lst psi=psi_codfw_masters.lst omega=omega_codfw_masters.lst
Mentioned in SAL (#wikimedia-operations) [2024-03-22T06:33:10Z] <ryankemper> T358882 Also updated cross-cluster seeds for ports 9243 and 9443. Everything should be as expected now.
Change #1014600 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):
[operations/puppet@production] elastic: replace some masters
Change #1014600 merged by Ryan Kemper:
[operations/puppet@production] elastic: replace some masters
Mentioned in SAL (#wikimedia-operations) [2024-03-26T20:09:20Z] <ryankemper@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: cycle some masters - ryankemper@cumin2002 - T358882
Mentioned in SAL (#wikimedia-operations) [2024-03-26T21:45:32Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: cycle some masters - ryankemper@cumin2002 - T358882
Mentioned in SAL (#wikimedia-operations) [2024-03-27T01:06:46Z] <ryankemper> T358882 Updated remote cluster seeds for new master state
Mentioned in SAL (#wikimedia-operations) [2024-03-27T01:35:23Z] <ryankemper@cumin2002> START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2037*,elastic2038*,elastic2041*,elastic2042*,elastic2045*,elastic2046*,elastic2047*,elastic2050*,elastic2051*,elastic2052*,elastic2039*,elastic2040*,elastic2043*,elastic2044*,elastic2048*,elastic2053*,elastic2054* for prepare for decom of hosts - ryankemper@cumin2002 - T358882
Mentioned in SAL (#wikimedia-operations) [2024-03-27T01:35:28Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2037*,elastic2038*,elastic2041*,elastic2042*,elastic2045*,elastic2046*,elastic2047*,elastic2050*,elastic2051*,elastic2052*,elastic2039*,elastic2040*,elastic2043*,elastic2044*,elastic2048*,elastic2053*,elastic2054* for prepare for decom of hosts - ryankemper@cumin2002 - T358882
elastic2037 is reported by Netbox for not being anymore in puppetdb, please either decommission it or shut it down. No host should be powered on without puppet running for extensive period of time.
Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:51:40Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on elastic2038.codfw.wmnet with reason: T358882
Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:51:44Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on elastic2038.codfw.wmnet with reason: T358882
Change #1015119 had a related patch set uploaded (by Bking; author: Bking):
[operations/puppet@production] elasticsearch: remove soon-to-be-decommed codfw hosts
Change #1015119 merged by Bking:
[operations/puppet@production] elasticsearch: remove soon-to-be-decommed codfw hosts
Change #1015123 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):
[operations/puppet@production] elastic: decom elastic20[37-54]
Change #1015123 merged by Bking:
[operations/puppet@production] elastic: decom elastic20[37-54]
cookbooks.sre.hosts.decommission executed by ryankemper@cumin2002 for hosts: elastic[2052-2054].codfw.wmnet
ERROR: some step on some host failed, check the bolded items above
cookbooks.sre.hosts.decommission executed by ryankemper@cumin2002 for hosts: elastic2037.codfw.wmnet
ERROR: some step on some host failed, check the bolded items above