While looking at T115231 I found this node had problems scheduling pods and terminating old ones. All pods on it were unhealthy, I cordoned the node and deleted the pods so they got re-scheduled on other nodes. Looks like this one is under heavy load and docker ps is simply sitting there looking at me not returning anything.
Description
Description
Related Objects
Related Objects
- Mentioned In
- T115231: dplbot webservice on Toolforge repeatedly have its dynamicproxy entry removed (because qsub schedules tasks to webgrid queues, causing portreleaser to run as job epilogue)
- Mentioned Here
- T115231: dplbot webservice on Toolforge repeatedly have its dynamicproxy entry removed (because qsub schedules tasks to webgrid queues, causing portreleaser to run as job epilogue)
Event Timeline
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2019-12-29T01:38:46Z] <Krenair> Cordoned tools-worker-1012 and deleted pods associated with dplbot and dewikigreetbot as well as my own testing one, host seems to be under heavy load - T241523
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2019-12-30T05:02:14Z] <andrewbogott> moving tools-worker-1012 to cloudvirt1024 for T241523