Select candidate jobs for transferring to the new infrastucture
Open, HighPublic

Description

Out of all the job types that are run in production we need to select candidates for being the first transferred to the new EventBus infrastructure. Requirements:

  • Low volume
  • Idempotence - the job would initially be double-processed by old and new infra, so doing it twice shouldn't cause any trouble
  • Preferably low importance - if something goes wrong it should be either easily fixable or possible to ignore
  • As simple as possible - no delayed executions, root/leaf job splitting, no recursion and no importance for deduplication.

For reference here's the list of job types currently executed in production with some notes (integral list available as P5964):

I've looked through the following jobs (struck-through jobs have been moved):

Pchelolo created this task.Sep 6 2017, 8:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 6 2017, 8:02 PM
mobrovac raised the priority of this task from Normal to High.Sep 7 2017, 9:14 AM
mobrovac edited projects, added EventBus, ChangeProp, MediaWiki-JobQueue; removed Goal, Epic.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: Aklapper.
Restricted Application added a project: Analytics. · View Herald TranscriptSep 7 2017, 9:14 AM
mobrovac updated the task description. (Show Details)Sep 7 2017, 9:19 AM
Pchelolo updated the task description. (Show Details)Sep 7 2017, 8:53 PM
Pchelolo added a subscriber: EBernhardson.
Pchelolo updated the task description. (Show Details)Sep 7 2017, 9:38 PM
Pchelolo updated the task description. (Show Details)Sep 7 2017, 10:04 PM
mobrovac updated the task description. (Show Details)Sep 8 2017, 12:07 PM

cirrusSearchCheckerJob - basically idempotent. It verifies data in elasticsearch matches mysql, creates new jobs if they don't match. Uses delayed execution.
cirrusSearchDeleteArchive - idempotent - checks database to verify archive indexing is still appropriate when run.
cirrusSearchDeletePages - idempotent
cirrusSearchElasticaWrite - idempotent. Issued to retry failed write requests to elasticsearch. uses delayed execution
cirrusSearchIncomingLinkCount - idempotent. expensive, high volume duplicates
cirrusSearchLinksUpdate - idempotent, expensive
cirrusSearchLinksUpdatePrioritized - idempotent, expensive,
cirrusSearchMassIndex - idempotent, expensive, low volume
cirrusSearchOtherIndex - cant use versioning, so out of order updates could be problematic

Pchelolo updated the task description. (Show Details)Sep 8 2017, 9:57 PM

Thank you @EBernhardson, updated the task with your info. Now we've got a complete list of jobs executed in production.

elukey moved this task from Backlog to Keep an eye on it on the User-Elukey board.Sep 11 2017, 2:47 PM

IMHO, updateBetaFeaturesUserCounts is the perfect candidate here. It's very lightweight (one SELECT, one UPDATE), it's idempotent and low-volume.

IMHO, updateBetaFeaturesUserCounts is the perfect candidate here. It's very lightweight (one SELECT, one UPDATE), it's idempotent and low-volume.

Sounds like a solid choice to me. Not terribly sexy, but straightforward.

Change 377518 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable processing of the updateBetaFeaturesUserCounts job.

https://gerrit.wikimedia.org/r/377518

Change 377518 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable processing of the updateBetaFeaturesUserCounts job.

https://gerrit.wikimedia.org/r/377518

Mentioned in SAL (#wikimedia-operations) [2017-09-13T14:22:51Z] <mobrovac@tin> Started deploy [cpjobqueue/deploy@60d0a78]: Start using the EventBus infrastructure for the updateBetaFeaturesUserCounts job - T175210

Mentioned in SAL (#wikimedia-operations) [2017-09-13T14:23:24Z] <mobrovac@tin> Finished deploy [cpjobqueue/deploy@60d0a78]: Start using the EventBus infrastructure for the updateBetaFeaturesUserCounts job - T175210 (duration: 00m 33s)

mobrovac closed this task as Resolved.

The job is being double-produced now, so resolving.

Given the useful information we have in this task, I am proposing to widen the scope beyond the first job, towards generally coordinating the order of migrating individual jobs. @mobrovac, does that sound reasonable to you?

mobrovac reopened this task as Open.Sep 14 2017, 1:18 PM
mobrovac edited projects, added Services (doing); removed Services (done).

Sure.

I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are fairly low volume, would it make sense to just deploy all of the hearted ones in the next wave?

I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are fairly low volume, would it make sense to just deploy all of the hearted ones in the next wave?

Good idea. Once we fully switch the first one to EB, there is no need to go one by one for low-risk and straightforward jobs.

mforns moved this task from Incoming to Radar on the Analytics board.Sep 28 2017, 3:41 PM

Mentioned in SAL (#wikimedia-operations) [2017-11-02T16:04:16Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Use only EventBus for processing updateBetaFeatureUserCount - T175210 (duration: 00m 51s)

Change 388139 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/mediawiki-config@master] JobQueue: Use EventBus for all "hearted" jobs

https://gerrit.wikimedia.org/r/388139

Change 388139 merged by jenkins-bot:
[operations/mediawiki-config@master] JobQueue: Use EventBus for all "hearted" jobs

https://gerrit.wikimedia.org/r/388139

Change 389491 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable more 'hearted' jobs

https://gerrit.wikimedia.org/r/389491

Change 389491 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable more 'hearted' jobs

https://gerrit.wikimedia.org/r/389491

Mentioned in SAL (#wikimedia-operations) [2017-11-06T14:49:23Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@e93feba]: Start processing all 'hearted' jobs T175210

Mentioned in SAL (#wikimedia-operations) [2017-11-06T14:50:07Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@e93feba]: Start processing all 'hearted' jobs T175210 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-11-06T14:50:20Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Switch MessageIndexRebuildJob, flaggedrevs_CacheUpdate and deleteLinks jobs to the EventBus infrastructure - T175210 (duration: 00m 46s)

Change 389495 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Correct the regex for the consumed topics

https://gerrit.wikimedia.org/r/389495

Change 389495 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Correct the regex for the consumed topics

https://gerrit.wikimedia.org/r/389495

Change 389497 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Finally use correct regexes for matching the jobs

https://gerrit.wikimedia.org/r/389497

Change 389497 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Finally use correct regexes for matching the jobs

https://gerrit.wikimedia.org/r/389497

mobrovac updated the task description. (Show Details)Mon, Nov 6, 3:38 PM

Change 389669 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Remove the Host header from the request

https://gerrit.wikimedia.org/r/389669

Change 389669 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Remove the Host header from the request

https://gerrit.wikimedia.org/r/389669

Out of the IRC discussion we've got 3 candidates for the next migration:

  • wikibase-UpdateUsagesForPage - super high traffic, well tested on beta, but super easy. TODO talk to Wikidata
  • ORESFetchScoresJob - low traffic, quite problematic
  • recentchangesupdate - decent traffic, very high user-visible effect.