Page MenuHomePhabricator

Use PHP7 to run all async jobs
Open, NormalPublic

Description

We want to migrate the async jobs to use PHP7, and to be able to do so job-by-job. In order to do this, we need to:

  • Pick a couple jobs to test first, change their configuration in changeprop to sending the PHP_ENGINE=php7 cookie
  • Once we're convinced by both the latencies and the overall performance, switch the other jobs progressively
  • Check that logs/errors are collected
  • Check that metrics are collected
  • Amend the apache configuration to remove the need for the cookie
  • Revert the addition of the cookie to changeprop

Job list, roughly we'll migrate in that order:

  • updateBetaFeaturesUserCounts (510703)
  • RecordLintJob (511436)
  • htmlCacheUpdate (511649)
  • MessageIndexRebuildJob
  • flaggedrevs_CacheUpdate
  • deleteLinks
  • wikibase-addUsagesForPage
  • cdnPurge
  • cdnPurg
  • refreshLinks (too much traffic)
  • RefreshLinks (too much traffic)
  • refreshLinksPrioritized (too much traffic)
  • recentChangesUpdate ((too much traffic, very high user-visible effect.)
  • EchoNotificationDeleteJob
  • wikibase-InjectRCRecords
  • categoryMembershipChange
  • ORESFetchScoresJob ( low traffic, quite problematic)
  • AssembleUploadChunks
  • BounceHandlerJob
  • CentralAuthCreateLocalAccountJob
  • enotifNotify
  • gwtoolsetGWTFileBackendCleanupJob
  • LocalPageMoveJob
  • LocalRenameUserJob
  • LoginNotifyChecks
  • MassMessageJob
  • MassMessageSubmitJob
  • MessageGroupStatesUpdaterJob
  • PublishStashedFile
  • GlobalUserPageLocalJobSubmitJob
  • renameUser
  • ThumbnailRender (?)
  • TranslationsUpdateJob
  • TranslateRenderJob
  • TranslatablePageMoveJob
  • TranslateDeleteJob
  • UpdateRepoOnDelete
  • UpdateRepoOnMove
  • webVideoTranscode
  • webVideoTranscodePrioritized
  • wikibase-UpdateUsagesForPage (super high traffic)
  • ChangeNotification (hight rate)
  • CognateCacheUpdateJob (basically a wrapper over HTMLCacheUpdatejob)
  • EchoNotificationJob
  • cirrusSearchCheckerJob (Tricky. It runs from a cron script scheduling bulk jobs with a set of pageIds and uses delay 1,2,3,4... to scatter the jobs in time)
  • cirrusSearchMassIndex
  • sendMail
  • cirrusSearchDeleteArchive
  • deletePage
  • cirrusSearchOtherIndex
  • refreshLinksDynamic
  • LocalSharedHelpPageCacheUpdateJob
  • cirrusSearchJobChecker
  • constraintsTableUpdate
  • synchroniseThreadArticleData
  • cirrusSearchLinksUpdatePrioritized
  • cirrusSearchElasticaWrite
  • compileArticleMetadata
  • clearUserWatchlist
  • BounceHandlerNotificationJob
  • cirrusSearchIncomingLinkCount
  • MessageGroupStatsRebuildJob
  • cpjobqueue.error
  • gwtoolsetUploadMetadataJob
  • CognateLocalJobSubmitJob
  • cirrusSearchDeletePages
  • TTMServerMessageUpdateJob
  • LocalRenameUserJob
  • userGroupExpiry
  • crosswikiSuppressUser
  • securePollPopulateVoterList
  • CentralAuthUnattachUserJob
  • --domain
  • htmlCacheUpdate
  • LocalGlobalUserPageCacheUpdateJob
  • globalUsageCachePurge
  • constraintsRunCheck
  • MassMessageServerSideJob
  • activityUpdateJob
  • translationNotificationJob
  • cirrusSearchLinksUpdate

mediawiki/includes/jobqueue/jobs/

1 LoginNotifyChecks
2 RecordLintJob
3 EchoNotificationDeleteJob
8 cirrusSearchIncomingLinkCount
8 enotifNotify
10 activityUpdateJob
58 cirrusSearchLinksUpdatePrioritized
63 recentChangesUpdate
64 refreshLinks
72 categoryMembershipChange
81 cirrusSearchLinksUpdate
165 htmlCacheUpdate
2000 cirrusSearchCheckerJob

Event Timeline

Joe created this task.Mar 25 2019, 12:46 PM
jijiki added a subscriber: jijiki.Mar 26 2019, 3:35 PM
jijiki updated the task description. (Show Details)Mar 29 2019, 5:09 PM
jijiki updated the task description. (Show Details)Mar 29 2019, 5:23 PM

If I understand correctly, in order to switch a particular job execution to PHP7 all we need to do is to add Cookie: PHP_ENGINE=php7 header to the request.

The requests are templated in the cpjobqueue config so all we need is to add that header to the template.

However, there's a bit of a complication here. We don't dedicate a rule per job, many of the low-traffic jobs are sharing a low_traffic_jobs rule and those share the template for the request to JobRunner - that will complicate switching the jobs one-by-one.

So, for the transition period we probably should just create a new config stanza job_php_version and set (not set) the cookie in runtime.

T175210 and friends contain some info on jobs that we've gathered when we have switched the jobs to Kafka, that might be useful.

Please ping me if you think a temporary config stanza is an ok solution, I'll implement it

jijiki triaged this task as Normal priority.Apr 3 2019, 6:38 AM
jijiki moved this task from Backlog/Radar to In Progress on the User-jijiki board.Apr 4 2019, 9:14 PM

Change 502840 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Krinkle added a subscriber: Krinkle.EditedApr 10 2019, 6:37 PM

@Pchelolo I think it may be better to wait with actual switching of prod jobs until T219279 and T218005 are resolved, given that unlike web requests, a job doesn't offer a way with retrying when they fail. The job being queued is kind of promise for us to run it eventually, and given the fatal nature of these errors that's hard to fulfil.

Working on the logic for it is fine of course. Just the actual switch may be a bit too soon. Have we switched jobs in Beta already?

@Krinkle yeah we will wait for sure, meanwhile, we are exploring:)

jijiki updated the task description. (Show Details)Apr 10 2019, 6:40 PM

A job doesn't offer a way with retrying when they fail.

Actually, it does. We do retry jobs unless it explicitly prohibits retries. As a preparation step, we can actually make retries remove the PHP7 cookie. That way if job has fatalled on PHP7 it will be retried with HHVM.

However, I agree that enabling jobs in production might be premature, we can probably start experimenting in beta cluster. However, we'd need to resolve T215339 ASAP

jijiki updated the task description. (Show Details)Apr 10 2019, 7:03 PM

Change 508599 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

Change 502840 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:04:59Z] <mobrovac@deploy1001> Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:06:29Z] <mobrovac@deploy1001> Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)

Change 508599 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

jijiki added a comment.Wed, May 8, 7:54 PM

It looks like deployment-prep has an older php7.2 version than production, which is something we should fix as well

We have upgraded php7 on beta, so now it looks like async jobs are running. We will leave it as is until next week, where we will assess if it worked out.

Change 510703 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Change 510703 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:26:33Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:27:40Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)

jijiki updated the task description. (Show Details)Thu, May 16, 3:26 PM

Change 511414 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob job to PHP7

https://gerrit.wikimedia.org/r/511414

Change 511436 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Change 511436 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:42:32Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:43:28Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)

jijiki updated the task description. (Show Details)Mon, May 20, 8:28 PM

Change 511649 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Change 511414 abandoned by Effie Mouzeli:
Switch RecordLintJob job to PHP7

Reason:
Abandoned for 511436

https://gerrit.wikimedia.org/r/511414

Change 511649 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:07:10Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:08:04Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)

jijiki updated the task description. (Show Details)Tue, May 21, 12:56 PM

Change 511913 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913