Page MenuHomePhabricator

K3s startup errors monitoring and or fixing
Open, Needs TriagePublic

Description

During the Patch Demo incident on 2026-01-27 we had a hard time restarting k3s for two reasons:

  • /var/lib/rancher/k3s/server/cred/passwd newer than datastore and could cause a cluster outage. Remove the file(s) from disk and restart to be recreated from datastore.
    • Workaround rm /var/lib/rancher/k3s/server/cred/passwd
  • http: TLS handshake error from 127.0.0.1:56198: remote error: tls: bad certificate

We should make it so we don't have to do workaround when restarting k3s.