Refill tool stuck "waiting for an available worker"
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Keith_D
	May 22 2022, 8:03 PM

Description

Appear to be stuck and needs a prod to get it started again.

Related Objects

Mentioned In: T310754: Recurrent API worker failures
T310753: Set up liveliness/readiness probes
T309103: Set up monitoring for refill
Mentioned Here: T309103: Set up monitoring for refill

Event Timeline

Keith_D created this task.May 22 2022, 8:03 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 22 2022, 8:03 PM

Still down after 24 hours

Curb_Safe_Charmer assigned this task to • Bstorm.May 23 2022, 4:49 PM

@Bstorm apparently no longer works for WMF, so looking for someone else in the Cloud Services Team who can help.

Have run webservice restart, but still showing "waiting for an available worker".

Appears to be working again.

Yesterday evening (UK time), my attempts to restart the API service using 'webservice restart' didn't help, so I pinged @TheresNoTime on IRC and she quickly jumped on the problem, resolving it by deleting and redeploying it. I gather she'll provide a write up in due course.

Curb_Safe_Charmer closed this task as Resolved.May 24 2022, 11:50 AM

Curb_Safe_Charmer triaged this task as Medium priority.

Curb_Safe_Charmer removed subscribers: • Bstorm, cloud-services-team.

TheresNoTime mentioned this in T309103: Set up monitoring for refill.May 24 2022, 2:27 PM

In T308989#7952725, @Curb_Safe_Charmer wrote:

Yesterday evening (UK time), my attempts to restart the API service using 'webservice restart' didn't help, so I pinged @TheresNoTime on IRC and she quickly jumped on the problem, resolving it by deleting and redeploying it. I gather she'll provide a write up in due course.

So minus the running about trying to figure out what was going on, what resolved it was:

On `refill`

kubectl get pods
kubectl delete pods {name of pod}

On `refill-api`

kubectl get pods
kubectl delete pods {name of pod}

They will then auto-recreate.

I also deleted the deployment and re-deployed

kubectl get deployments
kubectl delete deployments {name of deployment}
kubectl apply -f worker-deployment.yml

I want to figure something out with T309103: Set up monitoring for refill so we don't end up relying on user reports 😄

If the worker is using a custom k8s deployment, consider configuring liveliness/readiness probes to make kubernetes restart the container when it gets stuck.

TheresNoTime mentioned this in T310753: Set up liveliness/readiness probes.Jun 15 2022, 10:48 PM

TheresNoTime mentioned this in T310754: Recurrent API worker failures.

Refill tool stuck "waiting for an available worker"Closed, ResolvedPublicActions