Page MenuHomePhabricator

tools-worker-1012 unhealthy?
Closed, ResolvedPublic


While looking at T115231 I found this node had problems scheduling pods and terminating old ones. All pods on it were unhealthy, I cordoned the node and deleted the pods so they got re-scheduled on other nodes. Looks like this one is under heavy load and docker ps is simply sitting there looking at me not returning anything.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2019-12-29T01:38:46Z] <Krenair> Cordoned tools-worker-1012 and deleted pods associated with dplbot and dewikigreetbot as well as my own testing one, host seems to be under heavy load - T241523

Mentioned in SAL (#wikimedia-cloud) [2019-12-30T05:02:14Z] <andrewbogott> moving tools-worker-1012 to cloudvirt1024 for T241523

Andrew claimed this task.
Andrew subscribed.

I moved this to a different cloudvirt and uncordoned.