@Steinsplitter reported to me that a few webservices started mysteriously giving 503 No webservice, without anything changed, and I thought if a webservice exits it should be restarted automatically. He pointed me to tool commons-delinquent and I looked:
tools.commons-delinquent@tools-sgebastion-08:~$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 60038 0.26217 lighttpd-c tools.common Rr 01/06/2020 22:59:07 webgrid-lighttpd@tools-sgewebg 1 931131 0.40062 demon tools.common r 11/04/2019 19:25:57 continuous@tools-sgeexec-0919. 1 2953225 0.31705 demon tools.common Rr 12/22/2019 09:41:50 continuous@tools-sgeexec-0924. 1
But it is not in tools-proxy-05 redis.
So I looked, how many tools are active in grid but not in redis:
webgrid-lighttpd:
12:15:08 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$ comm -23 <(qstat -u \* -q webgrid-lighttpd -xml | grep JB_owner | grep -oP '(?<=<JB_owner>tools\.).+(?=</JB_owner>)' | sort) <(curl -s tools-proxy-05:8081/list | jq . | grep -oP '(?<=").+(?=": {)' | sort)
ato
blockyquery
botriconferme
catgraph
cgstat
cluebotng
commons-delinquent
convert
deadlinks
derivative
dewikinews-rss
dispenser
dow
fountain-test
freddy2001
gerakitools
germancontributioncounts
grantmetrics
gyan
igloo
inactiveadmins
ip-range-calc
jimmy
khanomalumat
linedwell
mediaviews
metaviews
mostlinkedmissing
mrmetadata
musikanimal
osmlint
patrolstats
periodibot
poiimport
portal
ptwikis
quarry
render-tests
rotbot
russbot
searchsbl
shrinitools
shuaib
shuaib-bot
sign-language-browser
slumpartikel
soweego
stockholm-mania
svgtranslate
tessdata
text2hash
timerelengteam
title-search
toolhub
toolschecker-ge-ws
tulsibot
urbanecmbot
validator
vvoters
wahldiagramm
wdmap
wikidata-timeline
wikiedudashboard-test
wikilinkbot
wptestblog2
wscontest
yemen
zhdeletionpedia
zhwiki-qualifications-checkwebgrid-generic:
12:15:48 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$ comm -23 <(qstat -u \* -q webgrid-generic -xml | grep JB_owner | grep -oP '(?<=<JB_owner>tools\.).+(?=</JB_owner>)' | sort) <(curl -s tools-proxy-05:8081/list | jq . | grep -oP '(?<=").+(?=": {)' | sort)
montage-dev
russbotMany tools seem affected and T242166 is probably related. Not sure what happened. Shall I mass restart?