Page MenuHomePhabricator

Add jobrunner servers to Scap canary process
Open, LowPublic

Description

Follows-up:

  1. Ensure that (if not already) at least one job runner is included in the list of canary servers that Scap uses for deploying MediaWiki code. This alone will already be an improvement, as any hits for mediawiki/exception, mediawiki/error or hhvm that only happen in job runner context would then be caught early.
  2. Include ERROR (and higher) severity messages from the mediawiki/runJobs channel in the Logstash query for canary monitoring.
  3. Once the jobrunner and jobchron service logs are indexed by Logstash, include ERROR (and higher) severity messages in the Logstash query.

Note that the jobrunner and jobchron services are independent PHP CLI programs (not MediaWiki cli scripts) so their logs will have a different type, and are not presently included anywhere else.

Event Timeline

Krinkle created this task.Aug 4 2017, 3:40 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 4 2017, 3:40 AM
greg triaged this task as Low priority.Sep 18 2017, 4:47 PM
greg added a subscriber: greg.

Adding our Release-Engineering-Team (Kanban) project as we would like to work on this in the coming quarter or two (no promises though, this is not a "goal" only "other hoped for work").

mmodell edited projects, added Scap; removed Deployments.Apr 23 2018, 4:32 PM
mmodell added a subscriber: mmodell.
Krinkle renamed this task from Add jobrunners to Scap canary process to Add jobrunner servers to Scap canary process.Jul 12 2018, 3:59 AM
Krinkle added a project: WMF-JobQueue.
Krinkle moved this task from Untriaged to Meta on the WMF-JobQueue board.
Krinkle moved this task from Meta to Untriaged on the WMF-JobQueue board.Jul 12 2018, 10:54 PM
Krinkle moved this task from Untriaged to jobrunners on the WMF-JobQueue board.Aug 30 2018, 11:41 PM
Krinkle moved this task from jobrunners to Untriaged on the WMF-JobQueue board.

This task was referenced as actionable for aa recent incident.

https://wikitech.wikimedia.org/wiki/Incident_documentation/20190417-Jobqueue