PHP Fatal Error: Class undefined: JobExecutor (jobrunners try to run labswiki jobs)
Open, Needs TriagePublic

Description

Error

Request ID: 68e83d397bb1068efff4a69f

message
PHP Fatal Error from line 65 of /srv/mediawiki/rpc/RunSingleJob.php: Class undefined: JobExecutor

server: mw1296
wiki: labswiki
shard: wikitech
trace
#0 /srv/mediawiki/rpc/RunSingleJob.php(65): NO_FUNCTION_GIVEN()
#1 {main}

Impact

Unknown. Looks like maybe jobs are queued the wrong way. If that's the case, it might be that certain updates intended for Wikitech are not being applied (with no obvious means for recovery).

Notes

First seen 2018-10-18 but only a couple times each day. Unclear what the source of the jobs are. The job parameters are not available in the logs because the process is failing before the log context is established.

Krinkle created this task.Nov 7 2018, 12:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 7 2018, 12:52 AM

Change 474885 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/mediawiki-config@master] RunSingleJob: Check that JobExecutor has been loaded

https://gerrit.wikimedia.org/r/474885

mobrovac added a subscriber: mobrovac.

The above patch does not solve the problem, but it improves the situation in that it logs the offending event when this happens, which allow us to begin investigating.

Actually, I believe there's an easier way to find out the events that cause it: https://logstash.wikimedia.org/goto/2519c6383201ddf651de7e3effed92b9

The search shows that the job in question is BounceHandlerJob. Will continue the investigation.

Ok, I understand what's happening here. The BounceHandler extension sends the job cross-wiki, in this particular case to the wikitech wiki queue. However, JobQueueGroup::singleton()->get() uses global $wgJobTypeConf, which is different for all wikis except wikitech. That's how the event ends up in kafka and obviously fails cause wikitech doesn't support kafka job queue.

Thank you for the investigation @Pchelolo ! So once again we are hitting this global job conf problem.

Change 474885 merged by jenkins-bot:
[operations/mediawiki-config@master] RunSingleJob: Check that JobExecutor has been loaded

https://gerrit.wikimedia.org/r/474885

Mentioned in SAL (#wikimedia-operations) [2018-11-27T12:19:54Z] <mobrovac@deploy1001> Synchronized rpc/RunSingleJob.php: RunSingleJob: Check that JobExecutor has been loaded - T208922 (duration: 00m 47s)

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

Thanks for looking into this. Moving to our radar to keep tabs on. Do let me know if there's anything I can help with :)