⚓ T175210 Select candidate jobs for transferring to the new infrastucture

Subject	Repo	Branch	Lines +/-
[Config] Remove the Host header from the request	mediawiki/services/change-propagation/jobqueue-deploy	master	+0 -1
[Config] Finally use correct regexes for matching the jobs	mediawiki/services/change-propagation/jobqueue-deploy	master	+1 -1
[Config] Correct the regex for the consumed topics	mediawiki/services/change-propagation/jobqueue-deploy	master	+1 -1
[Config] Enable more 'hearted' jobs	mediawiki/services/change-propagation/jobqueue-deploy	master	+2 -2
JobQueue: Use EventBus for all "hearted" jobs	operations/mediawiki-config	master	+5 -3
[Config] Enable processing of the updateBetaFeaturesUserCounts job.	mediawiki/services/change-propagation/jobqueue-deploy	master	+9 -2

Status	Assigned	Task
Resolved	• Pchelolo	T157088 [EPIC] Develop a JobQueue backend based on EventBus
Resolved	• Pchelolo	T190327 FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus
Resolved	• Pchelolo	T175210 Select candidate jobs for transferring to the new infrastucture

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 6 2017, 8:02 PM

• mobrovac raised the priority of this task from Medium to High.Sep 7 2017, 9:14 AM

• mobrovac edited projects, added Event-Platform, ChangeProp, MediaWiki-Core-JobQueue; removed Goal, Epic.

• mobrovac updated the task description. (Show Details)

• mobrovac removed a subscriber: Aklapper.

Restricted Application added a project: Analytics. · View Herald TranscriptSep 7 2017, 9:14 AM

• mobrovac updated the task description. (Show Details)Sep 7 2017, 9:19 AM

• Pchelolo updated the task description. (Show Details)Sep 7 2017, 8:53 PM

• Pchelolo added a subscriber: EBernhardson.

• Pchelolo updated the task description. (Show Details)Sep 7 2017, 9:38 PM

• Pchelolo updated the task description. (Show Details)Sep 7 2017, 10:04 PM

• mobrovac updated the task description. (Show Details)Sep 8 2017, 12:07 PM

cirrusSearchCheckerJob - basically idempotent. It verifies data in elasticsearch matches mysql, creates new jobs if they don't match. Uses delayed execution.
cirrusSearchDeleteArchive - idempotent - checks database to verify archive indexing is still appropriate when run.
cirrusSearchDeletePages - idempotent
cirrusSearchElasticaWrite - idempotent. Issued to retry failed write requests to elasticsearch. uses delayed execution
cirrusSearchIncomingLinkCount - idempotent. expensive, high volume duplicates
cirrusSearchLinksUpdate - idempotent, expensive
cirrusSearchLinksUpdatePrioritized - idempotent, expensive,
cirrusSearchMassIndex - idempotent, expensive, low volume
cirrusSearchOtherIndex - cant use versioning, so out of order updates could be problematic

Thank you @EBernhardson, updated the task with your info. Now we've got a complete list of jobs executed in production.

• mobrovac mentioned this in T175281: Separate off ChangePropagation for JobQueue as a new deployment.Sep 11 2017, 9:46 AM

elukey moved this task from Backlog to Keep an eye on it on the User-Elukey board.Sep 11 2017, 2:47 PM

IMHO, updateBetaFeaturesUserCounts is the perfect candidate here. It's very lightweight (one SELECT, one UPDATE), it's idempotent and low-volume.

In T175210#3597099, @mobrovac wrote:

IMHO, updateBetaFeaturesUserCounts is the perfect candidate here. It's very lightweight (one SELECT, one UPDATE), it's idempotent and low-volume.

Sounds like a solid choice to me. Not terribly sexy, but straightforward.

• GWicke mentioned this in T175637: End of September milestone: Migrate first production use case.Sep 11 2017, 9:59 PM

Change 377518 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable processing of the updateBetaFeaturesUserCounts job.

https://gerrit.wikimedia.org/r/377518

gerritbot added a project: Patch-For-Review.Sep 12 2017, 6:04 PM

Change 377518 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable processing of the updateBetaFeaturesUserCounts job.

https://gerrit.wikimedia.org/r/377518

Mentioned in SAL (#wikimedia-operations) [2017-09-13T14:22:51Z] <mobrovac@tin> Started deploy [cpjobqueue/deploy@60d0a78]: Start using the EventBus infrastructure for the updateBetaFeaturesUserCounts job - T175210

Mentioned in SAL (#wikimedia-operations) [2017-09-13T14:23:24Z] <mobrovac@tin> Finished deploy [cpjobqueue/deploy@60d0a78]: Start using the EventBus infrastructure for the updateBetaFeaturesUserCounts job - T175210 (duration: 00m 33s)

The job is being double-produced now, so resolving.

Given the useful information we have in this task, I am proposing to widen the scope beyond the first job, towards generally coordinating the order of migrating individual jobs. @mobrovac, does that sound reasonable to you?

Sure.

I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are fairly low volume, would it make sense to just deploy all of the hearted ones in the next wave?

In T175210#3618572, @GWicke wrote:

I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are fairly low volume, would it make sense to just deploy all of the hearted ones in the next wave?

Good idea. Once we fully switch the first one to EB, there is no need to go one by one for low-risk and straightforward jobs.

mforns moved this task from Incoming to Radar on the Analytics board.Sep 28 2017, 3:41 PM

Mentioned in SAL (#wikimedia-operations) [2017-11-02T16:04:16Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Use only EventBus for processing updateBetaFeatureUserCount - T175210 (duration: 00m 51s)

Change 388139 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/mediawiki-config@master] JobQueue: Use EventBus for all "hearted" jobs

https://gerrit.wikimedia.org/r/388139

gerritbot added a project: Patch-For-Review.Nov 2 2017, 6:43 PM

Change 388139 merged by jenkins-bot:
[operations/mediawiki-config@master] JobQueue: Use EventBus for all "hearted" jobs

https://gerrit.wikimedia.org/r/388139

Change 389491 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable more 'hearted' jobs

https://gerrit.wikimedia.org/r/389491

Change 389491 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable more 'hearted' jobs

https://gerrit.wikimedia.org/r/389491

Mentioned in SAL (#wikimedia-operations) [2017-11-06T14:49:23Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@e93feba]: Start processing all 'hearted' jobs T175210

Mentioned in SAL (#wikimedia-operations) [2017-11-06T14:50:07Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@e93feba]: Start processing all 'hearted' jobs T175210 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-11-06T14:50:20Z] <mobrovac@tin> Synchronized wmf-config/jobqueue.php: Switch MessageIndexRebuildJob, flaggedrevs_CacheUpdate and deleteLinks jobs to the EventBus infrastructure - T175210 (duration: 00m 46s)

Change 389495 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Correct the regex for the consumed topics

https://gerrit.wikimedia.org/r/389495

Change 389495 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Correct the regex for the consumed topics

https://gerrit.wikimedia.org/r/389495

Change 389497 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Finally use correct regexes for matching the jobs

https://gerrit.wikimedia.org/r/389497

Change 389497 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Finally use correct regexes for matching the jobs

https://gerrit.wikimedia.org/r/389497

• mobrovac updated the task description. (Show Details)Nov 6 2017, 3:38 PM

Change 389669 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Remove the Host header from the request

https://gerrit.wikimedia.org/r/389669

Change 389669 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Remove the Host header from the request

https://gerrit.wikimedia.org/r/389669

Out of the IRC discussion we've got 3 candidates for the next migration:

wikibase-UpdateUsagesForPage - super high traffic, well tested on beta, but super easy. TODO talk to Wikidata
ORESFetchScoresJob - low traffic, quite problematic
recentchangesupdate - decent traffic, very high user-visible effect.

The wikibase-UpdateUsagesForPage job sounds like a perfect candidate to be the next one. It's ~220 jobs/s on average over the last month, it was well tested in beta labs and it seems idempotent and it doesn't seem to use any of the advanced JobQueue features like root job deduplication or delayed execution.

Additionally, it's the biggest user of the EnqueueJob, so in reality it creates 2 jobs per execution - one actual job and one EnqueueJob, so this job accounts for 440 jobs/s which is 40% of all the jobs in the queue.

This the new kaka-based queue, EnqueueJob is not needed any more (see T181216) so transferring it will move a very significant portion of the load out of the Redis queue.

@daniel what do you think about moving the wikibase-UpdateUsagesForPage to the Kafka-based queue? Am I correct thinking that this job is idempotent?

• Pchelolo updated the task description. (Show Details)Nov 24 2017, 10:37 AM

• mobrovac added a parent task: T175212: Services Q2 2017/18 goal: Migrate a subset of jobs to multi-DC enabled event processing infrastructure..Nov 28 2017, 10:33 AM

• mobrovac updated the task description. (Show Details)Dec 4 2017, 5:21 PM

• Pchelolo created subtask T182023: Migrate htmlCacheUpdate job to Kafka.Dec 4 2017, 7:25 PM

• mobrovac removed a subtask: T182023: Migrate htmlCacheUpdate job to Kafka.Dec 5 2017, 9:42 AM

• mobrovac added a parent task: T183744: FY17/18 Q3 Program 8 Services Goal: Migrate two high-traffic jobs over to EventBus.Dec 28 2017, 12:44 PM

• Pchelolo updated the task description. (Show Details)Jan 31 2018, 12:11 AM

EBernhardson updated the task description. (Show Details)Jan 31 2018, 4:56 AM

• Pchelolo removed a project: Patch-For-Review.Feb 28 2018, 8:21 PM

• Pchelolo updated the task description. (Show Details)

• mobrovac removed a subtask: T188540: Switch cdnPurge to Kafka.Mar 5 2018, 4:53 PM

• Pchelolo updated the task description. (Show Details)Mar 5 2018, 6:58 PM

• Pchelolo updated the task description. (Show Details)Mar 9 2018, 2:57 PM

• mobrovac added a parent task: T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus.Mar 21 2018, 7:18 PM

While resolving the cirrus search issues the next bulk of jobs can be switched. Here's what I propose:

recentChangesUpdate - 28/s
categoryMembershipChange - 12/s
EchoNotificationDeleteJob - 7/s
ORESFetchScoreJob - 6/s
wikibase-InjectRCRecords - 3/s

These are among the most high-frequency jobs left of the old queue, in total it's 56/s which is 1/3 of all the jobs left on the old queue. All of them seem idempotent and all of them seem to have simple parameters.

I propose to switch test wikis first as always and then go with a bulk switch for all the wikis.

Change 423486 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Switch remaining high traffic jobs for test wikis.

https://gerrit.wikimedia.org/r/423486

gerritbot added a project: Patch-For-Review.Apr 2 2018, 3:42 PM

Change 423487 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch remaining high traffic jobs for test wikis.

https://gerrit.wikimedia.org/r/423487

• mobrovac removed a project: Patch-For-Review.Apr 2 2018, 6:04 PM

• Pchelolo updated the task description. (Show Details)Apr 2 2018, 6:49 PM

• mobrovac removed a subtask: T189137: Migrate CirrusSearch jobs to Kafka queue.Apr 4 2018, 7:25 PM

• mobrovac removed parent tasks: T183744: FY17/18 Q3 Program 8 Services Goal: Migrate two high-traffic jobs over to EventBus, T175212: Services Q2 2017/18 goal: Migrate a subset of jobs to multi-DC enabled event processing infrastructure., T169937: Services Q1 2017/18 goal: Begin migrating job queue processing to multi-DC enabled eventbus infrastructure..

• mobrovac updated the task description. (Show Details)Apr 16 2018, 7:31 PM

• Pchelolo updated the task description. (Show Details)Apr 16 2018, 9:36 PM

• Pchelolo updated the task description. (Show Details)May 22 2018, 10:19 AM

Mentioned in SAL (#wikimedia-operations) [2018-05-22T10:30:45Z] <ppchelko@tin> Started deploy [cpjobqueue/deploy@b45cd3b]: Switch cross-wiki posting jobs for everything T175210

Mentioned in SAL (#wikimedia-operations) [2018-05-22T10:31:48Z] <ppchelko@tin> Finished deploy [cpjobqueue/deploy@b45cd3b]: Switch cross-wiki posting jobs for everything T175210 (duration: 01m 03s)

• Pchelolo updated the task description. (Show Details)May 29 2018, 10:10 AM

• Pchelolo updated the task description. (Show Details)Jun 5 2018, 9:22 AM

We have switched all jobs except certain outstanding problematic ones and we have tickets for all of them, so this ticket has served its purpose. Resolving.

• Pchelolo mentioned this in T219148: Use PHP7 to run all async jobs.Mar 29 2019, 5:31 PM

Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:44 AM

Select candidate jobs for transferring to the new infrastucture
Closed, ResolvedPublic
Actions

Description

Details

Related Objects
Search...

Event Timeline

Select candidate jobs for transferring to the new infrastuctureClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Select candidate jobs for transferring to the new infrastucture
Closed, ResolvedPublic
Actions

Related Objects
Search...