Drain, reimage and re-add to cluster:
- ganeti1023 A
- ganeti1024 C
- ganeti1025 A
- ganeti1026 A
- ganeti1027 C
- ganeti1028 C
- ganeti1029 A
- ganeti1030 A
- ganeti1031 A
- ganeti1032 A
- ganeti1033 D
- ganeti1034 D
- ganeti1035 A
- ganeti1036 B
- ganeti1037 C
- ganeti1038 D
Drain, reimage and re-add to cluster:
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | MoritzMuehlenhoff | T348730 repeated Ganeti VMs deadlocks due to DRBD bug on bullseye | |||
| Resolved | MoritzMuehlenhoff | T382507 Update remaining Ganeti servers in eqiad to Bookworm |
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1025.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1025.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=9ff89e50-cdd1-449a-a676-876c36729c2f) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1036.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1036.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1036.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=8efe0251-40ee-433b-a080-3bef582e4f79) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1026.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1026.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1026.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=dce06e0b-27de-4e76-8cf6-d4947764ef79) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1024.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1024.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1024.eqiad.wmnet with OS bookworm executed with errors:
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1024.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1024.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=4ced1ba3-f166-422d-a9cb-6875dd47d2ed) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1030.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1027.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1027.eqiad.wmnet with OS bookworm executed with errors:
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1030.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1030.eqiad.wmnet with OS bookworm completed:
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1027.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1027.eqiad.wmnet with OS bookworm completed:
Mentioned in SAL (#wikimedia-operations) [2025-03-03T13:24:22Z] <moritzm> failover Ganeti master in eqiad to ganeti1048 T382507
Icinga downtime and Alertmanager silence (ID=49bd4e46-521c-46ca-9334-5c777206e882) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1031.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1031.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1031.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=836a9ab9-c457-4a78-ab8b-24d0332b99af) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1032.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1032.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1032.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=25ef85c8-8d74-4903-a4fb-449180b148f4) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1035.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1035.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1035.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=9fc8bc6c-fcab-42ee-95e1-ca8c3f853132) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1028.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1028.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1028.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=43cdf866-0dde-4aee-ad05-0604c388b7b3) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1037.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1037.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1037.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=a0399a93-44e5-45af-80d2-7c6886b8bcc5) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1034.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1034.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1034.eqiad.wmnet with OS bookworm completed:
Icinga downtime and Alertmanager silence (ID=fbeb54b5-2eb9-44e3-bebb-3ffb0c131169) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage
ganeti1029.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1029.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1029.eqiad.wmnet with OS bookworm completed:
Mentioned in SAL (#wikimedia-operations) [2025-03-18T12:35:33Z] <moritzm> rebalance ganeti eqiad/A following reimages T382507
Mentioned in SAL (#wikimedia-operations) [2025-03-19T07:51:21Z] <moritzm> rebalance ganeti eqiad/B following reimages T382507
Mentioned in SAL (#wikimedia-operations) [2025-03-20T07:24:37Z] <moritzm> rebalance ganeti eqiad/C following reimages T382507
Mentioned in SAL (#wikimedia-operations) [2025-03-24T07:28:36Z] <moritzm> rebalance ganeti eqiad/D following reimages T382507