Out of all the job types that are run in production we need to select candidates for being the first transferred to the new Event-Platform infrastructure. Requirements:
- Low volume
- Idempotence - the job would initially be double-processed by old and new infra, so doing it twice shouldn't cause any trouble
- Preferably low importance - if something goes wrong it should be either easily fixable or possible to ignore
- As simple as possible - no delayed executions, root/leaf job splitting, no recursion and no importance for deduplication.
For reference here's the list of job types currently executed in production with some notes (integral list available as P5964):
I've looked through the following jobs (struck-through jobs have been moved):
- AssembleUploadChunks - not idempotent
- BounceHandlerJob - not idempotent
- BounceHandlerNotificationJob - not idempotent
- categoryMembershipChange - not idempotent
- cdnPurge - uses delayed execution
- CentralAuthCreateLocalAccountJob - not idempotent
- ChangeNotification - too high rate
- cirrusSearchCheckerJob - basically idempotent. It verifies data in elasticsearch matches mysql, creates new jobs if they don't match. Uses delayed execution. Tricky. It runs from a cron script scheduling bulk jobs with a set of pageIds and uses delay 1,2,3,4... to scatter the jobs in time. Really this is abusing the delayed job functionality, and what it really needs is a job scheduler that can insert jobs in the future.
- cirrusSearchDeleteArchive - idempotent - checks database to verify archive indexing is still appropriate when run.
- cirrusSearchDeletePages - idempotent
- cirrusSearchElasticaWrite - idempotent. Issued to retry failed write requests to elasticsearch. uses delayed execution
- cirrusSearchIncomingLinkCount - idempotent. expensive, high volume duplicates
- cirrusSearchLinksUpdate - idempotent, expensive
- cirrusSearchLinksUpdatePrioritized - idempotent, expensive,
- cirrusSearchMassIndex - idempotent, expensive, low volume
- cirrusSearchOtherIndex - cant use versioning, so out of order updates could be problematic
- CognateCacheUpdateJob - basically a wrapper over HTMLCacheUpdatejob
- CognateLocalJobSubmitJob - basically submits a job to a bunch of other sites
- constraintsTableUpdate - some Wikidata job, not clear
- deleteLinks a very good candidate, low volume (<1/s), idempotent
- EchoNotificationDeleteJob - it's probably idempotent as it just reduces the number of notifications to a specified maximum, but it does unfold when it contains more the one userId
- EchoNotificationJob - not idempotent T192945
- enotifNotify - sends emails, definitely not idempotent
- enqueue - enqueues other jobs, pretty important to begin with (removed by T181216: Get rid of pointless EnqueueJob usage)
- flaggedrevs_CacheUpdate - idempotent, low volume
- globalUsageCachePurge - inserts HTMLCacheUpdate jobs for local wikis
- GlobalUserPageLocalJobSubmitJob - just submits other jobs
- gwtoolsetGWTFileBackendCleanupJob
- gwtoolsetUploadMediafileJob T192946
- gwtoolsetUploadMetadataJob
- htmlCacheUpdate - recursive
- LocalGlobalUserPageCacheUpdateJob - idempotent, but enqueues other jobs and HTMLCacheUpdateJob
- LocalPageMoveJob - not idempotent
- LocalRenameUserJob - not idempotent
- LoginNotifyChecks - didn't quite understand what that does.
- MassMessageJob - sends a message to the user, obviously not idempotent
- MassMessageSubmitJob - enqueues other jobs
- MessageGroupStatesUpdaterJob -
- MessageIndexRebuildJob - rebuilds some indexes, so should be idempotent. ❤️~~
- ORESFetchScoreJob - should not be duplicated as uses ORES~~
- PublishStashedFile - uploads files, shouldn't be duplicated
- recentChangesUpdate - too much traffic in this one
- RecordLintJob - stores lint errors in DB, shouldn't be duplicated.
- refreshLinks - too much traffic
- refreshLinksPrioritized - same as previous
- renameUser - renames a user, obviously not idempotent
- ThumbnailRender - renders thumbnails, kinda idempotent, but duplicating will severely increase the load
- TranslatablePageMoveJob - moves pages, obviously not idempotent
- TranslateDeleteJob - deletes stuff. Kinda idempotent
- TranslateRenderJob - Job for updating translation pages when translation or template changes.
- TranslationsUpdateJob - Job for updating translation units and translation pages when a translatable page is marked for translation.
- TTMServerMessageUpdateJob - This one retries itself and disables JobQueue retry service. TODO Need to add support for this possibility
- updateBetaFeaturesUserCounts - just updates the user count, idempotent, low volume
- UpdateRepoOnDelete - Provides logic to update the repo after page deletes in the client.
- UpdateRepoOnMove - Provides logic to update the repo after page moves in the client.
- webVideoTranscode - T188947
- webVideoTranscodePrioritized - shouldn't be duplicated as provides a lot of load T188947
- wikibase-addUsagesForPage - not sure what it does
- wikibase-InjectRCRecords - not idempotent