The report at https://tools.wmflabs.org/grid-jobs/tool/dibot currently shows many duplicate jobs running for dibot:
Job Total seen Active Last seen (exit) filemoves_replacer 84 31 Currently running inc_check 129 33 Currently running inc_image 4 2 Currently running inc_main 7 5 Currently running inc_mritog 13 5 Currently running inc_redirect_deleter 7 2 Currently running inc_remindbot 3 0 2018-06-22 03:26 lighttpd-dibot 1 1 Currently running nullbot 13 4 Currently running pats-gadget 84 30 Currently running removeout 1 0 2018-06-19 23:58 statbot 3 1 Currently running
The crontab for this tool includes the -once flag for all of these jobs except the pats-gadget and filemoves_replacer jobs. In theory this flag should have prevented multiple jobs with the same name from starting. In practice it obviously did not.
There are two issues to address here:
- Stopping extra jobs that are running to free up grid engine capacity for all tools
- Understanding what could have made jsub -once fail like this
This could possibly be related to T194380: Identify bots using AES128-SHA maintainers running on toolforge and T195834: mono-based bot hangs after mono version upgrade as these jobs all include a -v MONO_TLS_PROVIDER=btls flag in the crontab.