The cadvisor rollout (T108027) seems to break Kuberenetes nodes (the kubelet, specifically). When rolled out to ml-staging2001, the kubelet stopped repsonding and the node was tagged unreachable. The systemd unit for the kubelet is now an alias for cadvisor and starts that binary:
$ systemctl cat kubelet # /lib/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Server Documentation=https://github.com/kubernetes/kubernetes Documentation=man:kubelet After=network.target After=docker.service Requires=docker.service Conflicts=cadvisor.service [Service] EnvironmentFile=-/etc/default/%p WorkingDirectory=/var/lib/kubelet ExecStart=/usr/bin/kubelet $DAEMON_ARGS Restart=on-failure [Install] WantedBy=multi-user.target Alias=cadvisor.service # /etc/systemd/system/cadvisor.service.d/puppet-override.conf [Service] ExecStart= ExecStart=/usr/bin/cadvisor --listen_ip=10.192.0.201 --port=4194 --enable_metrics=accelerator,app,cpu,disk,diskIO,memory,network,oom_event,perf_event
In contrast, on an unaffected machine:
$ systemctl cat kubelet # /lib/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Server Documentation=https://github.com/kubernetes/kubernetes Documentation=man:kubelet After=network.target After=docker.service Requires=docker.service Conflicts=cadvisor.service [Service] EnvironmentFile=-/etc/default/%p WorkingDirectory=/var/lib/kubelet ExecStart=/usr/bin/kubelet $DAEMON_ARGS Restart=on-failure [Install] WantedBy=multi-user.target Alias=cadvisor.service
Note how it still has Alias=cadvisor.servive, but runs a different binary.