Page MenuHomePhabricator

Use PHP7 to run all async jobs
Open, NormalPublic

Description

We want to migrate the async jobs to use PHP7, and to be able to do so job-by-job. In order to do this, we need to:

  • Pick a couple jobs to test first, change their configuration in changeprop to sending the PHP_ENGINE=php7 cookie
  • Once we're convinced by both the latencies and the overall performance, switch the other jobs progressively
  • Check that logs/errors are collected
  • Check that metrics are collected
  • Amend the apache configuration to remove the need for the cookie
  • Revert the addition of the cookie to changeprop

Job list, roughly we'll migrate in that order:

  • updateBetaFeaturesUserCounts (510703)
  • RecordLintJob (511436)
  • htmlCacheUpdate (511649)
  • wikibase-addUsagesForPage (511913)
  • ORESFetchScoresJob (512858)
  • RecentChangesUpdate() (Hight traffic, user visible)(512872)
  • refreshLinks (too much traffic)
  • cirrusSearchCheckerJob (Tricky. It runs from a cron script scheduling bulk jobs with a set of pageIds and uses delay 1,2,3,4... to scatter the jobs in time
  • cirrusSearchDeleteArchive
  • cirrusSearchDeletePages
  • cirrusSearchElasticaWrite
  • cirrusSearchIncomingLinkCount
  • cirrusSearchLinksUpdate
  • cirrusSearchLinksUpdatePrioritized
  • cirrusSearchOtherIndex
  • cdnPurge
  • categoryMembershipChange
  • ThumbnailRender
  • constraintsRunCheck
  • webVideoTranscode
  • webVideoTranscodePrioritized
  • refreshLinksPrioritized (too much traffic)
  • TranslationsUpdateJob
  • TranslateRenderJob
  • TranslatablePageMoveJob
  • TranslateDeleteJob
  • translationNotificationJob
  • wikibase-UpdateUsagesForPage (super high traffic)
  • ChangeNotification (hight rate)
  • CognateCacheUpdateJob (basically a wrapper over HTMLCacheUpdatejob)
  • flaggedrevs_CacheUpdate
  • deleteLinks
  • EchoNotificationDeleteJob
  • wikibase-InjectRCRecords
  • AssembleUploadChunks
  • BounceHandlerJob
  • CentralAuthCreateLocalAccountJob
  • enotifNotify
  • gwtoolsetGWTFileBackendCleanupJob
  • LocalPageMoveJob
  • LocalRenameUserJob
  • LoginNotifyChecks
  • MassMessageJob
  • MassMessageSubmitJob
  • MassMessageServerSideJob
  • MessageGroupStatesUpdaterJob
  • MessageIndexRebuildJob
  • PublishStashedFile
  • GlobalUserPageLocalJobSubmitJob
  • renameUser
  • UpdateRepoOnDelete
  • UpdateRepoOnMove
  • EchoNotificationJob
  • cirrusSearchMassIndex
  • sendMail
  • deletePage
  • refreshLinksDynamic
  • LocalSharedHelpPageCacheUpdateJob
  • cirrusSearchJobChecker
  • constraintsTableUpdate
  • synchroniseThreadArticleData
  • compileArticleMetadata
  • clearUserWatchlist
  • BounceHandlerNotificationJob
  • MessageGroupStatsRebuildJob
  • cpjobqueue.error
  • gwtoolsetUploadMetadataJob
  • CognateLocalJobSubmitJob
  • TTMServerMessageUpdateJob
  • LocalRenameUserJob
  • userGroupExpiry
  • crosswikiSuppressUser
  • securePollPopulateVoterList
  • CentralAuthUnattachUserJob
  • LocalGlobalUserPageCacheUpdateJob
  • globalUsageCachePurge
  • activityUpdateJob

mediawiki/includes/jobqueue/jobs/

1 LoginNotifyChecks
2 RecordLintJob
3 EchoNotificationDeleteJob
8 cirrusSearchIncomingLinkCount
8 enotifNotify
10 activityUpdateJob
58 cirrusSearchLinksUpdatePrioritized
63 recentChangesUpdate
64 refreshLinks
72 categoryMembershipChange
81 cirrusSearchLinksUpdate
165 htmlCacheUpdate
2000 cirrusSearchCheckerJob

Graphs:

Event Timeline

Joe created this task.Mar 25 2019, 12:46 PM
jijiki added a subscriber: jijiki.Mar 26 2019, 3:35 PM
jijiki updated the task description. (Show Details)Mar 29 2019, 5:09 PM
jijiki updated the task description. (Show Details)Mar 29 2019, 5:23 PM

If I understand correctly, in order to switch a particular job execution to PHP7 all we need to do is to add Cookie: PHP_ENGINE=php7 header to the request.

The requests are templated in the cpjobqueue config so all we need is to add that header to the template.

However, there's a bit of a complication here. We don't dedicate a rule per job, many of the low-traffic jobs are sharing a low_traffic_jobs rule and those share the template for the request to JobRunner - that will complicate switching the jobs one-by-one.

So, for the transition period we probably should just create a new config stanza job_php_version and set (not set) the cookie in runtime.

T175210 and friends contain some info on jobs that we've gathered when we have switched the jobs to Kafka, that might be useful.

Please ping me if you think a temporary config stanza is an ok solution, I'll implement it

jijiki triaged this task as Normal priority.Apr 3 2019, 6:38 AM
jijiki moved this task from Backlog/Radar to In Progress on the User-jijiki board.Apr 4 2019, 9:14 PM

Change 502840 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Krinkle added a subscriber: Krinkle.EditedApr 10 2019, 6:37 PM

@Pchelolo I think it may be better to wait with actual switching of prod jobs until T219279 and T218005 are resolved, given that unlike web requests, a job doesn't offer a way with retrying when they fail. The job being queued is kind of promise for us to run it eventually, and given the fatal nature of these errors that's hard to fulfil.

Working on the logic for it is fine of course. Just the actual switch may be a bit too soon. Have we switched jobs in Beta already?

@Krinkle yeah we will wait for sure, meanwhile, we are exploring:)

jijiki updated the task description. (Show Details)Apr 10 2019, 6:40 PM

A job doesn't offer a way with retrying when they fail.

Actually, it does. We do retry jobs unless it explicitly prohibits retries. As a preparation step, we can actually make retries remove the PHP7 cookie. That way if job has fatalled on PHP7 it will be retried with HHVM.

However, I agree that enabling jobs in production might be premature, we can probably start experimenting in beta cluster. However, we'd need to resolve T215339 ASAP

jijiki updated the task description. (Show Details)Apr 10 2019, 7:03 PM

Change 508599 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

Change 502840 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:04:59Z] <mobrovac@deploy1001> Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:06:29Z] <mobrovac@deploy1001> Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)

Change 508599 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

jijiki added a comment.May 8 2019, 7:54 PM

It looks like deployment-prep has an older php7.2 version than production, which is something we should fix as well

We have upgraded php7 on beta, so now it looks like async jobs are running. We will leave it as is until next week, where we will assess if it worked out.

Change 510703 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Change 510703 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:26:33Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:27:40Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)

jijiki updated the task description. (Show Details)May 16 2019, 3:26 PM

Change 511414 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob job to PHP7

https://gerrit.wikimedia.org/r/511414

Change 511436 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Change 511436 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:42:32Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:43:28Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)

jijiki updated the task description. (Show Details)May 20 2019, 8:28 PM

Change 511649 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Change 511414 abandoned by Effie Mouzeli:
Switch RecordLintJob job to PHP7

Reason:
Abandoned for 511436

https://gerrit.wikimedia.org/r/511414

Change 511649 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:07:10Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:08:04Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)

jijiki updated the task description. (Show Details)May 21 2019, 12:56 PM

Change 511913 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913

Change 511913 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913

Mentioned in SAL (#wikimedia-operations) [2019-05-27T09:58:20Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-27T09:59:29Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148 (duration: 01m 09s)

jijiki updated the task description. (Show Details)May 28 2019, 7:07 AM

Change 512858 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch ORESFetchScoresJob to PHP7.

https://gerrit.wikimedia.org/r/512858

Change 512858 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch ORESFetchScoresJob to PHP7.

https://gerrit.wikimedia.org/r/512858

Mentioned in SAL (#wikimedia-operations) [2019-05-28T08:52:45Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-28T08:54:06Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148 (duration: 01m 21s)

Change 512872 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecentChangesUpdate to PHP7

https://gerrit.wikimedia.org/r/512872

jijiki updated the task description. (Show Details)May 28 2019, 9:18 AM
akosiaris moved this task from Backlog to Next up on the serviceops board.Fri, Jun 21, 8:59 AM
jijiki updated the task description. (Show Details)Tue, Jun 25, 5:35 PM

Change 512872 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecentChangesUpdate to PHP7

https://gerrit.wikimedia.org/r/512872

Mentioned in SAL (#wikimedia-operations) [2019-06-25T17:49:47Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-06-25T17:51:24Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148 (duration: 01m 37s)

jijiki updated the task description. (Show Details)Tue, Jun 25, 7:38 PM

Change 521290 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch refreshLinks to PHP7.

https://gerrit.wikimedia.org/r/521290

Change 521290 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch refreshLinks to PHP7.

https://gerrit.wikimedia.org/r/521290

jijiki updated the task description. (Show Details)Mon, Jul 8, 3:15 PM

Change 521501 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch cirrusSearchCheckerJob to PHP7.

https://gerrit.wikimedia.org/r/521501

Pchelolo updated the task description. (Show Details)Tue, Jul 9, 2:24 PM

Change 521501 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch cirrus* jobs to PHP7.

https://gerrit.wikimedia.org/r/521501

jijiki updated the task description. (Show Details)Tue, Jul 9, 2:55 PM
jijiki updated the task description. (Show Details)Wed, Jul 10, 1:54 PM

Change 521880 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch more high traffic jobs to PHP7.

https://gerrit.wikimedia.org/r/521880

Change 521880 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie for all hightraffic jobs to PHP7.

https://gerrit.wikimedia.org/r/521880

jijiki updated the task description. (Show Details)Wed, Jul 10, 7:15 PM
jijiki moved this task from Next up to Doing on the serviceops board.Fri, Jul 12, 10:45 AM

Change 522408 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie to videoscaler jobs

https://gerrit.wikimedia.org/r/522408

Change 522472 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: jobrunners: Enable php7_only feature flags

https://gerrit.wikimedia.org/r/522472

Change 522511 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] WIP: Revert all changes for switching jobs to PHP7

https://gerrit.wikimedia.org/r/522511

Change 522408 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie to videoscaling jobs

https://gerrit.wikimedia.org/r/522408

jijiki updated the task description. (Show Details)Wed, Jul 17, 8:00 AM

After discussing with @Pchelolo, we believe that in order to migrate the rest, we could migrate ~25% of jobrunners in eqiad (6 servers) to serve only via PHP7. With this scenario, if we have jobs that are failing, they have a good chance of the job eventually running, since it will be retried a couple of times.

Change 524336 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] profile::mediawiki::jobrunner: Configure php7_only flag

https://gerrit.wikimedia.org/r/524336