Select candidate jobs for transferring to the new infrastucture
Open, HighPublic

Description

Out of all the job types that are run in production we need to select candidates for being the first transferred to the new EventBus infrastructure. Requirements:

  • Low volume
  • Idempotence - the job would initially be double-processed by old and new infra, so doing it twice shouldn't cause any trouble
  • Preferably low importance - if something goes wrong it should be either easily fixable or possible to ignore
  • As simple as possible - no delayed executions, root/leaf job splitting, no recursion and no importance for deduplication.

For reference here's the list of job types currently executed in production with some notes (integral list available as P5964):

I've looked through the following jobs:

Pchelolo created this task.Wed, Sep 6, 8:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptWed, Sep 6, 8:02 PM
mobrovac raised the priority of this task from Normal to High.Thu, Sep 7, 9:14 AM
mobrovac edited projects, added EventBus, ChangeProp, MediaWiki-JobQueue; removed Goal, Epic.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: Aklapper.
Restricted Application added a project: Analytics. · View Herald TranscriptThu, Sep 7, 9:14 AM
mobrovac updated the task description. (Show Details)Thu, Sep 7, 9:19 AM
Pchelolo updated the task description. (Show Details)Thu, Sep 7, 8:53 PM
Pchelolo added a subscriber: EBernhardson.
Pchelolo updated the task description. (Show Details)Thu, Sep 7, 9:38 PM
Pchelolo updated the task description. (Show Details)Thu, Sep 7, 10:04 PM
mobrovac updated the task description. (Show Details)Fri, Sep 8, 12:07 PM

cirrusSearchCheckerJob - basically idempotent. It verifies data in elasticsearch matches mysql, creates new jobs if they don't match. Uses delayed execution.
cirrusSearchDeleteArchive - idempotent - checks database to verify archive indexing is still appropriate when run.
cirrusSearchDeletePages - idempotent
cirrusSearchElasticaWrite - idempotent. Issued to retry failed write requests to elasticsearch. uses delayed execution
cirrusSearchIncomingLinkCount - idempotent. expensive, high volume duplicates
cirrusSearchLinksUpdate - idempotent, expensive
cirrusSearchLinksUpdatePrioritized - idempotent, expensive,
cirrusSearchMassIndex - idempotent, expensive, low volume
cirrusSearchOtherIndex - cant use versioning, so out of order updates could be problematic

Pchelolo updated the task description. (Show Details)Fri, Sep 8, 9:57 PM

Thank you @EBernhardson, updated the task with your info. Now we've got a complete list of jobs executed in production.

elukey moved this task from Backlog to Keep an eye on it on the User-Elukey board.Mon, Sep 11, 2:47 PM

IMHO, updateBetaFeaturesUserCounts is the perfect candidate here. It's very lightweight (one SELECT, one UPDATE), it's idempotent and low-volume.

IMHO, updateBetaFeaturesUserCounts is the perfect candidate here. It's very lightweight (one SELECT, one UPDATE), it's idempotent and low-volume.

Sounds like a solid choice to me. Not terribly sexy, but straightforward.

Change 377518 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable processing of the updateBetaFeaturesUserCounts job.

https://gerrit.wikimedia.org/r/377518

Change 377518 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] [Config] Enable processing of the updateBetaFeaturesUserCounts job.

https://gerrit.wikimedia.org/r/377518

Mentioned in SAL (#wikimedia-operations) [2017-09-13T14:22:51Z] <mobrovac@tin> Started deploy [cpjobqueue/deploy@60d0a78]: Start using the EventBus infrastructure for the updateBetaFeaturesUserCounts job - T175210

Mentioned in SAL (#wikimedia-operations) [2017-09-13T14:23:24Z] <mobrovac@tin> Finished deploy [cpjobqueue/deploy@60d0a78]: Start using the EventBus infrastructure for the updateBetaFeaturesUserCounts job - T175210 (duration: 00m 33s)

mobrovac closed this task as Resolved.

The job is being double-produced now, so resolving.

Given the useful information we have in this task, I am proposing to widen the scope beyond the first job, towards generally coordinating the order of migrating individual jobs. @mobrovac, does that sound reasonable to you?

mobrovac reopened this task as Open.Thu, Sep 14, 1:18 PM
mobrovac edited projects, added Services (doing); removed Services (done).

Sure.

I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are fairly low volume, would it make sense to just deploy all of the hearted ones in the next wave?

I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are fairly low volume, would it make sense to just deploy all of the hearted ones in the next wave?

Good idea. Once we fully switch the first one to EB, there is no need to go one by one for low-risk and straightforward jobs.