Page MenuHomePhabricator

Weird issue with the wmcs-k8s-node-upgrade.py script
Open, Needs TriagePublic

Description

At the end of a k8s worker upgrade on a fast connection, occasionally the node is replying with an unexpected version when checking the node config. That causes the script to stop processing there and doesn't uncordon the node during the upgrade.

We only ever saw it when @mdipietro ran the script, which is quite likely due to having a much faster connection that literally anyone else who ever ran the script (generally either across the Atlantic Ocean or on a mobile connection). The error could have been caused by something other than a fast connection race condition, but that's a working theory. This needs to be resolved before the next tools k8s upgrade.