@Steinsplitter reported to me that a few webservices started mysteriously giving 503 No webservice, without anything changed, and I thought if a webservice exits it should be restarted automatically. He pointed me to tool commons-delinquent and I looked:
tools.commons-delinquent@tools-sgebastion-08:~$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 60038 0.26217 lighttpd-c tools.common Rr 01/06/2020 22:59:07 webgrid-lighttpd@tools-sgewebg 1 931131 0.40062 demon tools.common r 11/04/2019 19:25:57 continuous@tools-sgeexec-0919. 1 2953225 0.31705 demon tools.common Rr 12/22/2019 09:41:50 continuous@tools-sgeexec-0924. 1
But it is not in tools-proxy-05 redis.
So I looked, how many tools are active in grid but not in redis:
webgrid-lighttpd:
12:15:08 0 โ zhuyifei1999@tools-sgebastion-08: ~$ comm -23 <(qstat -u \* -q webgrid-lighttpd -xml | grep JB_owner | grep -oP '(?<=<JB_owner>tools\.).+(?=</JB_owner>)' | sort) <(curl -s tools-proxy-05:8081/list | jq . | grep -oP '(?<=").+(?=": {)' | sort) ato blockyquery botriconferme catgraph cgstat cluebotng commons-delinquent convert deadlinks derivative dewikinews-rss dispenser dow fountain-test freddy2001 gerakitools germancontributioncounts grantmetrics gyan igloo inactiveadmins ip-range-calc jimmy khanomalumat linedwell mediaviews metaviews mostlinkedmissing mrmetadata musikanimal osmlint patrolstats periodibot poiimport portal ptwikis quarry render-tests rotbot russbot searchsbl shrinitools shuaib shuaib-bot sign-language-browser slumpartikel soweego stockholm-mania svgtranslate tessdata text2hash timerelengteam title-search toolhub toolschecker-ge-ws tulsibot urbanecmbot validator vvoters wahldiagramm wdmap wikidata-timeline wikiedudashboard-test wikilinkbot wptestblog2 wscontest yemen zhdeletionpedia zhwiki-qualifications-check
webgrid-generic:
12:15:48 0 โ zhuyifei1999@tools-sgebastion-08: ~$ comm -23 <(qstat -u \* -q webgrid-generic -xml | grep JB_owner | grep -oP '(?<=<JB_owner>tools\.).+(?=</JB_owner>)' | sort) <(curl -s tools-proxy-05:8081/list | jq . | grep -oP '(?<=").+(?=": {)' | sort) montage-dev russbot
Many tools seem affected and T242166 is probably related. Not sure what happened. Shall I mass restart?