Page MenuHomePhabricator

Roll-reimage cookbook should lock on a per-node and per-taint group basis
Open, Needs TriagePublic

Description

Currently, it only limits concurrency per cluster, but nothing is stopping multiple invocations of this (or other k8s node maintenance) cookbook from clashing. (k8s API will eventually error out on node clashes, but the cookbook takes quite a few actions before that point, so it would be better UX and safer to error out early; and currently nothing is preventing multiple invocations from adding up to too much lost capacity.) This could be prevented by having a per-taint-group and per-node lock (a global one, not specific to this cookbook).