Page MenuHomePhabricator

No jobs running on beta cluster
Closed, ResolvedPublic

Description

Per T186993#4929163:

Details

Related Gerrit Patches:

Event Timeline

EBernhardson triaged this task as High priority.Feb 5 2019, 8:02 PM
EBernhardson created this task.

Mentioned in SAL (#wikimedia-cloud) [2019-02-05T20:07:22Z] <ebernhardson> jobrunner port 9006 is firewalled, revert to 9005 and created T215339 to fix job queue in beta cluste

Cparle added a subscriber: Cparle.Mar 12 2019, 10:23 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptMar 12 2019, 10:34 AM
This comment was removed by Krenair.

Just checking in to see if there's any movement on this ...

jijiki added a subscriber: jijiki.Apr 10 2019, 7:04 PM

This is important to Growth-Team as well as we use the job queue in various places of GrowthExperiments and can't QA some features at the moment. Is there an ETA for when this might be resolved?

Krenair added a comment.EditedApr 11 2019, 4:20 PM

It seems to me that someone should be able to make profile::mediawiki::jobrunner_tls permit customisation of the hostname instead of hardcoding wmnet, generate the certificates using the deployment-puppetmaster CA and stick them in a labs/private commit on the puppetmaster, and move the inclusion of that profile outside the LVS check?

Joe added a comment.Apr 19 2019, 9:42 AM

FWIW, I don't think we need the TLS configuration in beta. I can try to simplify things. Sorry for not noticing this bug earlier, but adding Operations or better serviceops could've helped it coming to my attention.

Joe claimed this task.Apr 19 2019, 9:42 AM

Change 505222 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::jobrunner: allow reaching the local endpoint

https://gerrit.wikimedia.org/r/505222

Change 505222 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::jobrunner: allow reaching the local endpoint

https://gerrit.wikimedia.org/r/505222

Joe closed this task as Resolved.Apr 19 2019, 11:00 AM

I fixed the configuration of cpjobqueue in deployment-prep, restarted the service, and verified requests are not getting through to the jobrunner:

2019-04-19T10:59:07	10234170	172.16.4.124	proxy:fcgi://127.0.0.1:9000/200	0	POST	http://deployment-jobrunner03.deployment-prep.eqiad.wmflabs:9006/rpc/RunSingleJob.php-text/html	-	-	ChangePropagation-JobQueue/WMF	-	-	-	-	172.16.4.124

I think we can consider this resolved.

@Joe I just tried to send a mass message on the beta cluster, and the message was "queued" but isn't actually being delivered. Is that part of this task?

@Joe I just tried to send a mass message on the beta cluster, and the message was "queued" but isn't actually being delivered. Is that part of this task?

I see you submitted a job and it got executed successfully:

MassMessageSubmitJob User:DannyS712/mms class=MediaWiki\MassMessage\MassMessageJob data={"comment":["DannyS712","enwiki","https://en.wikipedia.beta.wmflabs.org/w/index.php?title=User:DannyS712/mms\u0026oldid=391772"],"message":"hello","originWiki":"enwiki","spamlist":"User:DannyS712/mms","subject":"Testing","userId":32948} namespace=2 pages=[{"site":"en.wikipedia.beta.wmflabs.org","title":"User talk:DannyS712 test2","wiki":"enwiki"}] requestId=XLn4qawQBHcAABRZT90AAABB title=DannyS712/mms

So perhaps something else in Beta is not working if you don't see the delivery being done?