Page MenuHomePhabricator

Cannot stop ahechtbot webservice on gridengine, stuck in "dr" state.
Closed, ResolvedPublic

Description

I have moved the ahechtbot webservice from gridengine to kubernetes, but the gridengine webservice is stuck in the dr (deleting/running) state. I tried using jstop and it says job 3652274 is already in deletion. I tried qdel -f 3652274 and counts up a bunch of dots before returning me to the command line and qstat shows the job is still there. I tried qstat -xml to get the queue name, ssh'd into tools-sgeweblight-10-28.tools.eqiad1.wikimedia.cloud, ran ps ux, killed the running job, and exited out, but it still shows up under qstat. I have since waited a couple of days repeated the above steps, but the grid engine job is still there.

Event Timeline

taavi claimed this task.
taavi subscribed.
taavi@tools-sgegrid-master:~ $ sudo qdel -f 3652274
root forced the deletion of job 3652274