Currently, it only limits concurrency per cluster, but nothing is stopping multiple invocations of this (or other k8s node maintenance) cookbook from clashing. (k8s API will eventually error out on node clashes, but the cookbook takes quite a few actions before that point, so it would be better UX and safer to error out early; and currently nothing is preventing multiple invocations from adding up to too much lost capacity.) This could be prevented by having a per-taint-group and per-node lock (a global one, not specific to this cookbook).
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| In Progress | None | T341984 Update Kubernetes clusters to 1.31 | |||
| Resolved | JMeybohm | T269684 [EPIC] Docker deprecation as a container runtime enginer for kubernetes. | |||
| Resolved | JMeybohm | T362408 Migration to containerd and away from docker | |||
| Resolved | Raine | T377857 Cookbook to roll-reimage k8s nodes | |||
| Open | Raine | T383345 Roll-reimage cookbook should lock on a per-node and per-taint group basis |