The new refactored maintain-kubeusers takes a long time to loop over all the accounts (about 3.5k) to reconcile them.
I believe this is because for each account, the state configmap is queried. However, this is mostly read, so maybe we can explore how to cache this.
One consequence is that the daemon misses the livenessprove deadline and gets restarted often.
Approaches to explore:
- introduce some caching for state configmaps
- parallelization, i.e, check for multiple accounts in different async tasks
- some combination of both
- minimize amount of filesystem checks (NFS-induced latency)
- move the liveness probe check response inside the reconciliation loop
-
drop sleep(1) in certificate generation logic-- can't be done, as cert generation will reliably fail if there is no such delay