Appear to be stuck and needs a prod to get it started again.
Description
Related Objects
Event Timeline
@Bstorm apparently no longer works for WMF, so looking for someone else in the Cloud Services Team who can help.
Yesterday evening (UK time), my attempts to restart the API service using 'webservice restart' didn't help, so I pinged @TheresNoTime on IRC and she quickly jumped on the problem, resolving it by deleting and redeploying it. I gather she'll provide a write up in due course.
So minus the running about trying to figure out what was going on, what resolved it was:
On refill
- kubectl get pods
- kubectl delete pods {name of pod}
On refill-api
- kubectl get pods
- kubectl delete pods {name of pod}
They will then auto-recreate.
I also deleted the deployment and re-deployed
- kubectl get deployments
- kubectl delete deployments {name of deployment}
- kubectl apply -f worker-deployment.yml
I want to figure something out with T309103: Set up monitoring for refill so we don't end up relying on user reports 😄
If the worker is using a custom k8s deployment, consider configuring liveliness/readiness probes to make kubernetes restart the container when it gets stuck.