Page MenuHomePhabricator

JobQueue "runJobs" channel in Logstash should be enhanced to support filtering by job type
Closed, ResolvedPublic

Description

Gotta fix MediaWiki Job class to use a proper PSR3 logger

The mediawiki.runJobs errors are collected in logstash but the whole job description and errors are a single field message.

That does not let one figure out easily which jobs are most impacted or which error is trending. Example from logstash:

KeyValue
messageRESFetchScoreJob Q1104427 revid=378517804 extra_params={"precache":"true"} requestId=V@TQWQpAEDcAAMWtvY4AAAEP (uuid=ee43ad0481aa4c1ab67877606030894c,timestamp=1474613338,QueuePartition=rdb1-6380) t=15055 error=
url/rpc/RunJobs.php?wiki=wikidatawiki&type=ORESFetchScoreJob&maxtime=60&maxmem=300M
typemediawiki
channelrunJobs
wikiwikidatawiki

The /rpc/RunJobs.php invokes JobRunner::executeJob() which crafts the message via the public (non final) method Job::toString(). The default being to getting the class then append a concatenation f all parameters.

We should get logstash to extract useful values such as the job type, duration, args && parameters, duration.

Event Timeline

Change 312504 had a related patch set uploaded (by Hashar):
logstash: parse runJobs messages

https://gerrit.wikimedia.org/r/312504

hashar triaged this task as Medium priority.Sep 26 2016, 3:24 PM

Change 313408 had a related patch set uploaded (by Hashar):
beta: log bucket 'runJobs' to info

https://gerrit.wikimedia.org/r/313408

Change 313408 merged by jenkins-bot:
beta: log bucket 'runJobs' to info

https://gerrit.wikimedia.org/r/313408

Change 313410 had a related patch set uploaded (by Hashar):
beta: 'runJobs' to info for logstash

https://gerrit.wikimedia.org/r/313410

Change 313410 merged by jenkins-bot:
beta: 'runJobs' to info for logstash

https://gerrit.wikimedia.org/r/313410

Got Mediawiki on beta to send runJobs log bucket to logstash at INFO level (prod has WARNING).

Change 313625 had a related patch set uploaded (by Hashar):
(WIP) jobqueue: runJobs log now have context passed to them (WIP)

https://gerrit.wikimedia.org/r/313625

Change 312504 abandoned by Hashar:
logstash: parse runJobs messages

Reason:
Will be better done at the source (i.e. mediawiki itself). That is https://gerrit.wikimedia.org/r/#/c/313625/

https://gerrit.wikimedia.org/r/312504

Krinkle renamed this task from RunJobs logs in logstash should be enhanced to support filtering by job type to JobQueue "runJobs" channel in Logstash should be enhanced to support filtering by job type.Aug 4 2017, 3:24 AM
Krinkle edited projects, added MediaWiki-Core-JobQueue; removed WMF-JobQueue.

(Moved to #MediaWiki-JobQueue tag, tag #MediaWiki-JobRunner is about the standalone service, not the class in MediaWiki.)

Change 313625 merged by jenkins-bot:
[mediawiki/core@master] jobqueue: Add job_type to PSR logging context

https://gerrit.wikimedia.org/r/313625

Krinkle claimed this task.
Krinkle added a project: Performance-Team.

Confirmed. Log entries from Job runners with 1.30.0-wmf.14 have job_type, job_error, and job_duration fields in Logtash.