Page MenuHomePhabricator

Use PHP7 to run all async jobs
Closed, ResolvedPublic

Description

We want to migrate the async jobs to use PHP7, and to be able to do so job-by-job. In order to do this, we need to:

  • Pick a couple jobs to test first, change their configuration in changeprop to sending the PHP_ENGINE=php7 cookie
  • Once we're convinced by both the latencies and the overall performance, switch the other jobs progressively
  • Check that logs/errors are collected
  • Check that metrics are collected
  • Amend the apache configuration to remove the need for the cookie
  • Revert the addition of the cookie to changeprop

Job list, roughly we'll migrate in that order:

  • updateBetaFeaturesUserCounts (510703)
  • RecordLintJob (511436)
  • htmlCacheUpdate (511649)
  • wikibase-addUsagesForPage (511913)
  • ORESFetchScoresJob (512858)
  • RecentChangesUpdate() (Hight traffic, user visible)(512872)
  • refreshLinks (too much traffic)
  • cirrusSearchCheckerJob (Tricky. It runs from a cron script scheduling bulk jobs with a set of pageIds and uses delay 1,2,3,4... to scatter the jobs in time
  • cirrusSearchDeleteArchive
  • cirrusSearchDeletePages
  • cirrusSearchElasticaWrite
  • cirrusSearchIncomingLinkCount
  • cirrusSearchLinksUpdate
  • cirrusSearchLinksUpdatePrioritized
  • cirrusSearchOtherIndex
  • cdnPurge
  • categoryMembershipChange
  • ThumbnailRender
  • constraintsRunCheck
  • webVideoTranscode
  • webVideoTranscodePrioritized

The following jobs will be migrated as we are migrating jobrunners to serve via PHP7 only:

  • refreshLinksPrioritized (too much traffic)
  • TranslationsUpdateJob
  • TranslateRenderJob
  • TranslatablePageMoveJob
  • TranslateDeleteJob
  • translationNotificationJob
  • wikibase-UpdateUsagesForPage (super high traffic)
  • ChangeNotification (hight rate)
  • CognateCacheUpdateJob (basically a wrapper over HTMLCacheUpdatejob)
  • flaggedrevs_CacheUpdate
  • deleteLinks
  • EchoNotificationDeleteJob
  • wikibase-InjectRCRecords
  • AssembleUploadChunks
  • BounceHandlerJob
  • CentralAuthCreateLocalAccountJob
  • enotifNotify
  • gwtoolsetGWTFileBackendCleanupJob
  • LocalPageMoveJob
  • LocalRenameUserJob
  • LoginNotifyChecks
  • MassMessageJob
  • MassMessageSubmitJob
  • MassMessageServerSideJob
  • MessageGroupStatesUpdaterJob
  • MessageIndexRebuildJob
  • PublishStashedFile
  • GlobalUserPageLocalJobSubmitJob
  • renameUser
  • UpdateRepoOnDelete
  • UpdateRepoOnMove
  • EchoNotificationJob
  • cirrusSearchMassIndex
  • sendMail
  • deletePage
  • refreshLinksDynamic
  • LocalSharedHelpPageCacheUpdateJob
  • cirrusSearchJobChecker
  • constraintsTableUpdate
  • synchroniseThreadArticleData
  • compileArticleMetadata
  • clearUserWatchlist
  • BounceHandlerNotificationJob
  • MessageGroupStatsRebuildJob
  • cpjobqueue.error
  • gwtoolsetUploadMetadataJob
  • CognateLocalJobSubmitJob
  • TTMServerMessageUpdateJob
  • LocalRenameUserJob
  • userGroupExpiry
  • crosswikiSuppressUser
  • securePollPopulateVoterList
  • CentralAuthUnattachUserJob
  • LocalGlobalUserPageCacheUpdateJob

[x]globalUsageCachePurge

  • activityUpdateJob

Notes:

mediawiki/includes/jobqueue/jobs/

1 LoginNotifyChecks
2 RecordLintJob
3 EchoNotificationDeleteJob
8 cirrusSearchIncomingLinkCount
8 enotifNotify
10 activityUpdateJob
58 cirrusSearchLinksUpdatePrioritized
63 recentChangesUpdate
64 refreshLinks
72 categoryMembershipChange
81 cirrusSearchLinksUpdate
165 htmlCacheUpdate
2000 cirrusSearchCheckerJob

Graphs:

Details

Related Gerrit Patches:
operations/puppet : productionWIP: jobrunners: Make jobrunners PHP7 only by default
mediawiki/services/change-propagation/jobqueue-deploy : masterRevert all changes for switching jobs to PHP7
operations/puppet : productionjobrunners: Migrate all jobrunners to serve only via PHP7
operations/puppet : productionjobrunners: Test php7_only on 6 jobrunners
operations/puppet : productionprofile::mediawiki::jobrunner: Configure php7_only flag
mediawiki/services/change-propagation/jobqueue-deploy : masterAdd PHP7 cookie to videoscaling jobs
mediawiki/services/change-propagation/jobqueue-deploy : masterAdd PHP7 cookie for all hightraffic jobs to PHP7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch cirrus* jobs to PHP7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch refreshLinks to PHP7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch RecentChangesUpdate to PHP7
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch ORESFetchScoresJob to PHP7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch wikibase-addUsagesForPage to PHP7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch htmlCacheUpdate to PHP7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch RecordLintJob job to PHP7
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch RecordLintJob to php7.
mediawiki/services/change-propagation/jobqueue-deploy : masterSwitch updateBetaFeaturesUserCounts job to PHP 7.
mediawiki/services/change-propagation/jobqueue-deploy : masterEnable PHP 7 for all jobs in beta cluster.
mediawiki/services/change-propagation/jobqueue-deploy : masterAllow enabling PHP7 per job for low_traffic jobs.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 502840 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Krinkle added a subscriber: Krinkle.EditedApr 10 2019, 6:37 PM

@Pchelolo I think it may be better to wait with actual switching of prod jobs until T219279 and T218005 are resolved, given that unlike web requests, a job doesn't offer a way with retrying when they fail. The job being queued is kind of promise for us to run it eventually, and given the fatal nature of these errors that's hard to fulfil.

Working on the logic for it is fine of course. Just the actual switch may be a bit too soon. Have we switched jobs in Beta already?

@Krinkle yeah we will wait for sure, meanwhile, we are exploring:)

jijiki updated the task description. (Show Details)Apr 10 2019, 6:40 PM

A job doesn't offer a way with retrying when they fail.

Actually, it does. We do retry jobs unless it explicitly prohibits retries. As a preparation step, we can actually make retries remove the PHP7 cookie. That way if job has fatalled on PHP7 it will be retried with HHVM.

However, I agree that enabling jobs in production might be premature, we can probably start experimenting in beta cluster. However, we'd need to resolve T215339 ASAP

jijiki updated the task description. (Show Details)Apr 10 2019, 7:03 PM

Change 508599 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

Change 502840 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:04:59Z] <mobrovac@deploy1001> Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:06:29Z] <mobrovac@deploy1001> Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)

Change 508599 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

jijiki added a comment.May 8 2019, 7:54 PM

It looks like deployment-prep has an older php7.2 version than production, which is something we should fix as well

We have upgraded php7 on beta, so now it looks like async jobs are running. We will leave it as is until next week, where we will assess if it worked out.

Change 510703 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Change 510703 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:26:33Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:27:40Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)

jijiki updated the task description. (Show Details)May 16 2019, 3:26 PM

Change 511414 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob job to PHP7

https://gerrit.wikimedia.org/r/511414

Change 511436 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Change 511436 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:42:32Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:43:28Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)

jijiki updated the task description. (Show Details)May 20 2019, 8:28 PM

Change 511649 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Change 511414 abandoned by Effie Mouzeli:
Switch RecordLintJob job to PHP7

Reason:
Abandoned for 511436

https://gerrit.wikimedia.org/r/511414

Change 511649 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:07:10Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:08:04Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)

jijiki updated the task description. (Show Details)May 21 2019, 12:56 PM

Change 511913 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913

Change 511913 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913

Mentioned in SAL (#wikimedia-operations) [2019-05-27T09:58:20Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-27T09:59:29Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148 (duration: 01m 09s)

jijiki updated the task description. (Show Details)May 28 2019, 7:07 AM

Change 512858 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch ORESFetchScoresJob to PHP7.

https://gerrit.wikimedia.org/r/512858

Change 512858 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch ORESFetchScoresJob to PHP7.

https://gerrit.wikimedia.org/r/512858

Mentioned in SAL (#wikimedia-operations) [2019-05-28T08:52:45Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-28T08:54:06Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148 (duration: 01m 21s)

Change 512872 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecentChangesUpdate to PHP7

https://gerrit.wikimedia.org/r/512872

jijiki updated the task description. (Show Details)May 28 2019, 9:18 AM
akosiaris moved this task from Backlog to Next up on the serviceops board.Jun 21 2019, 8:59 AM
jijiki updated the task description. (Show Details)Jun 25 2019, 5:35 PM

Change 512872 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecentChangesUpdate to PHP7

https://gerrit.wikimedia.org/r/512872

Mentioned in SAL (#wikimedia-operations) [2019-06-25T17:49:47Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-06-25T17:51:24Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148 (duration: 01m 37s)

jijiki updated the task description. (Show Details)Jun 25 2019, 7:38 PM

Change 521290 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch refreshLinks to PHP7.

https://gerrit.wikimedia.org/r/521290

Change 521290 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch refreshLinks to PHP7.

https://gerrit.wikimedia.org/r/521290

jijiki updated the task description. (Show Details)Jul 8 2019, 3:15 PM

Change 521501 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch cirrusSearchCheckerJob to PHP7.

https://gerrit.wikimedia.org/r/521501

Pchelolo updated the task description. (Show Details)Jul 9 2019, 2:24 PM

Change 521501 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch cirrus* jobs to PHP7.

https://gerrit.wikimedia.org/r/521501

jijiki updated the task description. (Show Details)Jul 9 2019, 2:55 PM
jijiki updated the task description. (Show Details)Jul 10 2019, 1:54 PM

Change 521880 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch more high traffic jobs to PHP7.

https://gerrit.wikimedia.org/r/521880

Change 521880 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie for all hightraffic jobs to PHP7.

https://gerrit.wikimedia.org/r/521880

jijiki updated the task description. (Show Details)Jul 10 2019, 7:15 PM
jijiki moved this task from Next up to Doing on the serviceops board.Jul 12 2019, 10:45 AM

Change 522408 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie to videoscaler jobs

https://gerrit.wikimedia.org/r/522408

Change 522472 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: jobrunners: Enable php7_only feature flags

https://gerrit.wikimedia.org/r/522472

Change 522511 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] WIP: Revert all changes for switching jobs to PHP7

https://gerrit.wikimedia.org/r/522511

Change 522408 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie to videoscaling jobs

https://gerrit.wikimedia.org/r/522408

jijiki updated the task description. (Show Details)Jul 17 2019, 8:00 AM

After discussing with @Pchelolo, we believe that in order to migrate the rest, we could migrate ~25% of jobrunners in eqiad (6 servers) to serve only via PHP7. With this scenario, if we have jobs that are failing, they have a good chance of the job eventually running, since it will be retried a couple of times.

Change 524336 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] profile::mediawiki::jobrunner: Configure php7_only flag

https://gerrit.wikimedia.org/r/524336

Mentioned in SAL (#wikimedia-operations) [2019-07-22T10:23:44Z] <jijiki> Disable puppet on jobrunners for 524336 - T219148

Change 524336 merged by Effie Mouzeli:
[operations/puppet@production] profile::mediawiki::jobrunner: Configure php7_only flag

https://gerrit.wikimedia.org/r/524336

Change 522472 merged by Effie Mouzeli:
[operations/puppet@production] jobrunners: Test php7_only on 6 jobrunners

https://gerrit.wikimedia.org/r/522472

Mentioned in SAL (#wikimedia-operations) [2019-07-22T15:49:51Z] <jijiki> Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - T219148

jijiki updated the task description. (Show Details)Jul 23 2019, 12:55 PM

Change 525306 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] jobrunners: Convert all jobrunners to server PHP7 only

https://gerrit.wikimedia.org/r/525306

Change 525306 merged by Effie Mouzeli:
[operations/puppet@production] jobrunners: Migrate all jobrunners to serve only via PHP7

https://gerrit.wikimedia.org/r/525306

jijiki updated the task description. (Show Details)Jul 25 2019, 9:29 AM

All async jobs run on PHP7, we will keep an eye for about a week, and then cleanup code leftovers

  • change-propagation snippets and files
  • puppet - Remove php7_only feature flag (and make it default)

We will decide the following weeks if we are going to completely remove HHVM from jobrunners or not

jijiki moved this task from Doing to Next up on the serviceops board.Jul 25 2019, 1:46 PM

Change 522511 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Revert all changes for switching jobs to PHP7

https://gerrit.wikimedia.org/r/522511

Change 526132 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: jobrunners: Make jobrunners PHP7 only by default

https://gerrit.wikimedia.org/r/526132

jijiki added a comment.Aug 5 2019, 2:28 PM

Removing HHVM and any leftovers are now part of T229792, we mark this as resolved 💃

jijiki closed this task as Resolved.Aug 5 2019, 2:29 PM
jijiki claimed this task.

Change 526132 abandoned by Effie Mouzeli:
WIP: jobrunners: Make jobrunners PHP7 only by default

Reason:
needs rewrite

https://gerrit.wikimedia.org/r/526132