Page MenuHomePhabricator

Update maintain-kubeusers to occasionally (once per day?) verify the integrity of existing credentials
Closed, DeclinedPublic

Description

We have a handful of bug reports like T176027#4988213 where an existing tool account has been partially setup for use with Kubernetes. The root causes for the provisioning failures are currently unknown, but the "easy" fix is to:

  • Remove the $HOME/.kube of the tool
  • Stop maintain-kubeusers on tools-k8s-master-01
  • Edit /etc/kubernetes/tokenauth to remove any token already issued for the tool
  • Start maintain-kubeusers on tools-k8s-master-01
  • Monitor logs to see the tool's namespace and credentials created

It would be nice to have an automatic (or even manual honestly) verification command that would reconcile the issued tokens and created namespaces against the generated config files for each tool. This would involve a lot of filesystem reads across NFS so we don't want to do it really often, but once a week should be survivable. The implementation could apply some waits between file reads as well to be even more mindful of NFS server load.

Event Timeline

@Bstorm I think this is solved in the 2020 Kubernetes cluster, but I thought I should check with you before declining.

The root cause was likely failures within the code. The 2020 cluster uses precious little of that old code at this point (though it was initially adapted from it). The fix for the new setup is actually to remove the tool's namespace in Kubernetes, if it did happen. That said it touched every single config and didn't do this.

Central logging for admin stuff and auditing would help us quickly find issues with the new system, but this doesn't seem like the task for it. I'll definitely call the fix "shutting down the old system" :)