Page MenuHomePhabricator

No jobs running on beta cluster
Closed, ResolvedPublic

Description

Per T186993#4929163:

Event Timeline

EBernhardson created this task.

Mentioned in SAL (#wikimedia-cloud) [2019-02-05T20:07:22Z] <ebernhardson> jobrunner port 9006 is firewalled, revert to 9005 and created T215339 to fix job queue in beta cluste

Just checking in to see if there's any movement on this ...

This is important to Growth-Team as well as we use the job queue in various places of GrowthExperiments and can't QA some features at the moment. Is there an ETA for when this might be resolved?

It seems to me that someone should be able to make profile::mediawiki::jobrunner_tls permit customisation of the hostname instead of hardcoding wmnet, generate the certificates using the deployment-puppetmaster CA and stick them in a labs/private commit on the puppetmaster, and move the inclusion of that profile outside the LVS check?

FWIW, I don't think we need the TLS configuration in beta. I can try to simplify things. Sorry for not noticing this bug earlier, but adding SRE or better serviceops could've helped it coming to my attention.

Change 505222 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::jobrunner: allow reaching the local endpoint

https://gerrit.wikimedia.org/r/505222

Change 505222 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::jobrunner: allow reaching the local endpoint

https://gerrit.wikimedia.org/r/505222

I fixed the configuration of cpjobqueue in deployment-prep, restarted the service, and verified requests are not getting through to the jobrunner:

2019-04-19T10:59:07	10234170	172.16.4.124	proxy:fcgi://127.0.0.1:9000/200	0	POST	http://deployment-jobrunner03.deployment-prep.eqiad.wmflabs:9006/rpc/RunSingleJob.php-text/html	-	-	ChangePropagation-JobQueue/WMF	-	-	-	-	172.16.4.124

I think we can consider this resolved.

@Joe I just tried to send a mass message on the beta cluster, and the message was "queued" but isn't actually being delivered. Is that part of this task?

@Joe I just tried to send a mass message on the beta cluster, and the message was "queued" but isn't actually being delivered. Is that part of this task?

I see you submitted a job and it got executed successfully:

MassMessageSubmitJob User:DannyS712/mms class=MediaWiki\MassMessage\MassMessageJob data={"comment":["DannyS712","enwiki","https://en.wikipedia.beta.wmflabs.org/w/index.php?title=User:DannyS712/mms\u0026oldid=391772"],"message":"hello","originWiki":"enwiki","spamlist":"User:DannyS712/mms","subject":"Testing","userId":32948} namespace=2 pages=[{"site":"en.wikipedia.beta.wmflabs.org","title":"User talk:DannyS712 test2","wiki":"enwiki"}] requestId=XLn4qawQBHcAABRZT90AAABB title=DannyS712/mms

So perhaps something else in Beta is not working if you don't see the delivery being done?