Page MenuHomePhabricator

FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus
Closed, ResolvedPublic

Description

For FY17/18 Q4, the Services team will:

  • Port the remaining jobs over to EventBus
  • Enable support for private wikis

Details

SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+1 -132
operations/puppetproduction+14 -72
operations/mediawiki-configmaster+1 -3
mediawiki/services/change-propagation/jobqueue-deploymaster+23 -2
operations/puppetproduction+6 -0
operations/mediawiki-configmaster+5 -22
mediawiki/services/change-propagation/jobqueue-deploymaster+3 -45
operations/mediawiki-configmaster+5 -7
mediawiki/services/change-propagation/jobqueue-deploymaster+2 -3
operations/mediawiki-configmaster+8 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+7 -0
operations/mediawiki-configmaster+22 -8
mediawiki/services/change-propagation/jobqueue-deploymaster+15 -5
mediawiki/extensions/EventBusmaster+37 -0
operations/mediawiki-configmaster+9 -18
mediawiki/services/change-propagation/jobqueue-deploymaster+7 -5
operations/mediawiki-configmaster+26 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+47 -4
operations/mediawiki-configmaster+6 -13
operations/mediawiki-configmaster+12 -22
mediawiki/services/change-propagation/jobqueue-deploymaster+5 -25
operations/mediawiki-configmaster+22 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+23 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
ResolvedEBernhardson
ResolvedEBernhardson
Resolved Pchelolo
ResolvedOttomata
Resolved Pchelolo
Resolved Pchelolo
ResolvedSBisson
ResolvedPRODUCTION ERRORSBisson
Resolvedmatthiasmullie
ResolvedLadsgroup
ResolvedPRODUCTION ERROR mobrovac
InvalidNone
Resolved Pchelolo
ResolvedNikerabbit
ResolvedNikerabbit
Resolved mobrovac
Resolved Pchelolo
Resolvedfgiunchedi
ResolvedLadsgroup

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 434466 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Switch cross-wiki posting jobs for everything.

https://gerrit.wikimedia.org/r/434466

Change 434467 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [config] Switch cross-wiki posting jobs for everything.

https://gerrit.wikimedia.org/r/434467

Change 434467 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [config] Switch cross-wiki posting jobs for everything.

https://gerrit.wikimedia.org/r/434467

Change 434466 merged by jenkins-bot:
[operations/mediawiki-config@master] Switch cross-wiki posting jobs for everything.

https://gerrit.wikimedia.org/r/434466

Mentioned in SAL (#wikimedia-operations) [2018-05-22T10:32:03Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Switch cross-wiki posting jobs to EventBus - T190327 (duration: 01m 18s)

Change 429983 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch all jobs for everything except wikipedia, commons and wikiata.

https://gerrit.wikimedia.org/r/429983

Change 429980 merged by jenkins-bot:
[operations/mediawiki-config@master] Switch all jobs for everything except wikipedia, commons and wikidata.

https://gerrit.wikimedia.org/r/429980

Mentioned in SAL (#wikimedia-operations) [2018-05-24T09:37:25Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Switch all jobs to EventBus for everything except wikipedia, commons and wikidata, file 1/2 - T190327 (duration: 01m 09s)

Mentioned in SAL (#wikimedia-operations) [2018-05-24T09:53:42Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Switch all jobs to EventBus for everything except wikipedia, commons and wikidata, file 1/2, take #2 - T190327 (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2018-05-24T09:54:25Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@b537fa1]: Switch all non-special jobs for everything except wikipedia, commons and wikidata T190327

Mentioned in SAL (#wikimedia-operations) [2018-05-24T09:55:07Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@b537fa1]: Switch all non-special jobs for everything except wikipedia, commons and wikidata T190327 (duration: 00m 42s)

Mentioned in SAL (#wikimedia-operations) [2018-05-24T09:55:32Z] <mobrovac@tin> Synchronized wmf-config/InitialiseSettings.php: Switch all jobs to EventBus for everything except wikipedia, commons and wikidata, file 2/2 - T190327 (duration: 01m 06s)

Change 434891 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Switch all job apart from exceptions for everything.

https://gerrit.wikimedia.org/r/434891

Change 434893 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable all jobs apart from exceptions for everything.

https://gerrit.wikimedia.org/r/434893

Change 434893 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable all jobs apart from exceptions for everything.

https://gerrit.wikimedia.org/r/434893

Change 434891 merged by jenkins-bot:
[operations/mediawiki-config@master] Switch all job apart from exceptions for everything.

https://gerrit.wikimedia.org/r/434891

Mentioned in SAL (#wikimedia-operations) [2018-05-29T10:04:48Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@c6dc83d]: Enable all jobs apart from exceptions for everything. T190327

Mentioned in SAL (#wikimedia-operations) [2018-05-29T10:05:46Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@c6dc83d]: Enable all jobs apart from exceptions for everything. T190327 (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2018-05-29T10:06:26Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Switch all jobs to EventBus file 1/2 - T190327 T195500 (duration: 01m 39s)

Mentioned in SAL (#wikimedia-operations) [2018-05-29T10:09:33Z] <mobrovac@tin> Synchronized wmf-config/InitialiseSettings.php: Switch all jobs to EventBus file 2/2 - T190327 T195500 (duration: 01m 47s)

Did this break logging of jobs to runJobs.log on mwlog1001, and detection of jobs via maintenance/showJobs.php?

I was trying to look into T195397: {{PAGESINCATEGORY}} returns incorrect value on en-wiki Category:Candidates for speedy deletion and couldn't find any record of links update jobs running at all.

@Anomie

Did this break logging of jobs to runJobs.log on mwlog1001

The runJobs.log contains the logs for the old queue. The kafka-based queue logs can be found either in JobExecutor.log file on mwlog, or there's a more comprehensive logstash dashboard that contains all the logs related to the new system.

And detection of jobs via maintenance/showJobs.php?

That one still needs to be updated to fetch things from kafka in case the kafka queue is used.

The kafka-based queue logs can be found either in JobExecutor.log file on mwlog,

That appears to contain only errors.

or there's a more comprehensive logstash dashboard that contains all the logs related to the new system.

That link takes me to a dashboard that does no filtering at all, so it shows every log message whether job-related or not.

That link takes me to a dashboard that does no filtering at all, so it shows every log message whether job-related or not.

That's weird, perhaps a copy-paste error, or logstash bug. Here's a new link: https://logstash.wikimedia.org/goto/ccd5e2517591489ad88bc66922f8311c

That appears to contain only errors.

Created T195858 to make it DEBUG-log same as the JobRunner did.

The new link works. Although I don't see any messages in there about jobs being run, just errors and messages about jobs being deduplicated. And an occasional "Processed event sample" with no indication of what that means.

The new link works. Although I don't see any messages in there about jobs being run, just errors and messages about jobs being deduplicated. And an occasional "Processed event sample" with no indication of what that means.

Ye, we don't send all the jobs into logstash, that would be overwhelming for it. I'm adding debug logging about jobs being run to log into mwlog1001. Will be done soon.

Change 436574 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/puppet@production] Remove unused jobrunners.

https://gerrit.wikimedia.org/r/436574

Change 437281 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/puppet@production] Specify videoscalers uri in hiera/changeprop manifest.

https://gerrit.wikimedia.org/r/437281

Change 437285 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable videoscaler jobs in the new queue.

https://gerrit.wikimedia.org/r/437285

Change 437286 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Disable redis queue for videoscaler jobs.

https://gerrit.wikimedia.org/r/437286

Change 437281 merged by Giuseppe Lavagetto:
[operations/puppet@production] Specify videoscalers uri in hiera/changeprop manifest.

https://gerrit.wikimedia.org/r/437281

Change 437285 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable videoscaler jobs in the new queue.

https://gerrit.wikimedia.org/r/437285

Change 437286 merged by jenkins-bot:
[operations/mediawiki-config@master] Disable redis queue for videoscaler jobs.

https://gerrit.wikimedia.org/r/437286

Mentioned in SAL (#wikimedia-operations) [2018-06-05T08:51:13Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@63b30a6]: Enable videoscaler jobs in kafka T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-05T08:52:02Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@63b30a6]: Enable videoscaler jobs in kafka T190327 (duration: 00m 49s)

Mentioned in SAL (#wikimedia-operations) [2018-06-05T08:52:07Z] <mobrovac@deploy1001> Synchronized wmf-config/jobqueue.php: Switch video scaling jobs to EventBus - T190327 (duration: 00m 52s)

Change 436574 merged by Giuseppe Lavagetto:
[operations/puppet@production] Remove unused jobrunners.

https://gerrit.wikimedia.org/r/436574

Mentioned in SAL (#wikimedia-operations) [2018-06-05T11:21:05Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@aa5e94b]: Enable cirrus jobs in kafka for everything except wikipedia, wikidata and commons T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-05T11:21:46Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@aa5e94b]: Enable cirrus jobs in kafka for everything except wikipedia, wikidata and commons T190327 (duration: 00m 41s)

Mentioned in SAL (#wikimedia-operations) [2018-06-06T12:22:14Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@c8d62da]: Enable cirrus for everything T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-06T12:23:07Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@c8d62da]: Enable cirrus for everything T190327 (duration: 00m 47s)

Change 437767 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Switch all jobs to the new queue and clean up the old queue configs.

https://gerrit.wikimedia.org/r/437767

Change 437767 merged by Mobrovac:
[operations/mediawiki-config@master] Switch all jobs to the new queue and clean up the old queue configs.

https://gerrit.wikimedia.org/r/437767

Mentioned in SAL (#wikimedia-operations) [2018-06-26T08:46:33Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@7d9a1aa]: Enable all jobs in kafka queue T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-26T08:47:31Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@7d9a1aa]: Enable all jobs in kafka queue T190327 (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2018-06-26T08:47:33Z] <mobrovac@deploy1001> Synchronized wmf-config/CommonSettings.php: Switch the last remaining jobs to EventBus, only CommonSettings.php now - T190327 (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2018-06-26T08:48:07Z] <mobrovac@deploy1001> Started scap: Switch the last remaining jobs to EventBus, full scap sync for clean-up - T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-26T09:03:12Z] <mobrovac@deploy1001> Started scap: Switch the last remaining jobs to EventBus, full scap sync, take #2 - T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-26T09:11:47Z] <mobrovac@deploy1001> Finished scap: Switch the last remaining jobs to EventBus, full scap sync, take #2 - T190327 (duration: 08m 34s)

Mentioned in SAL (#wikimedia-operations) [2018-06-26T09:28:28Z] <mobrovac@deploy1001> Started scap: Switch the last remaining jobs to EventBus, full scap sync with HHVM restarts, take #3 - T190327

Mentioned in SAL (#wikimedia-operations) [2018-06-26T09:33:36Z] <mobrovac@deploy1001> Finished scap: Switch the last remaining jobs to EventBus, full scap sync with HHVM restarts, take #3 - T190327 (duration: 05m 07s)

mobrovac assigned this task to Pchelolo.

We are done, mesdames et messieurs. \o/ \o/ \o/