Page MenuHomePhabricator

Use PHP7 to run all async jobs
Closed, ResolvedPublic

Description

We want to migrate the async jobs to use PHP7, and to be able to do so job-by-job. In order to do this, we need to:

  • Pick a couple jobs to test first, change their configuration in changeprop to sending the PHP_ENGINE=php7 cookie
  • Once we're convinced by both the latencies and the overall performance, switch the other jobs progressively
  • Check that logs/errors are collected
  • Check that metrics are collected
  • Amend the apache configuration to remove the need for the cookie
  • Revert the addition of the cookie to changeprop

Job list, roughly we'll migrate in that order:

  • updateBetaFeaturesUserCounts (510703)
  • RecordLintJob (511436)
  • htmlCacheUpdate (511649)
  • wikibase-addUsagesForPage (511913)
  • ORESFetchScoresJob (512858)
  • RecentChangesUpdate() (Hight traffic, user visible)(512872)
  • refreshLinks (too much traffic)
  • cirrusSearchCheckerJob (Tricky. It runs from a cron script scheduling bulk jobs with a set of pageIds and uses delay 1,2,3,4... to scatter the jobs in time
  • cirrusSearchDeleteArchive
  • cirrusSearchDeletePages
  • cirrusSearchElasticaWrite
  • cirrusSearchIncomingLinkCount
  • cirrusSearchLinksUpdate
  • cirrusSearchLinksUpdatePrioritized
  • cirrusSearchOtherIndex
  • cdnPurge
  • categoryMembershipChange
  • ThumbnailRender
  • constraintsRunCheck
  • webVideoTranscode
  • webVideoTranscodePrioritized

The following jobs will be migrated as we are migrating jobrunners to serve via PHP7 only:

  • refreshLinksPrioritized (too much traffic)
  • TranslationsUpdateJob
  • TranslateRenderJob
  • TranslatablePageMoveJob
  • TranslateDeleteJob
  • translationNotificationJob
  • wikibase-UpdateUsagesForPage (super high traffic)
  • ChangeNotification (hight rate)
  • CognateCacheUpdateJob (basically a wrapper over HTMLCacheUpdatejob)
  • flaggedrevs_CacheUpdate
  • deleteLinks
  • EchoNotificationDeleteJob
  • wikibase-InjectRCRecords
  • AssembleUploadChunks
  • BounceHandlerJob
  • CentralAuthCreateLocalAccountJob
  • enotifNotify
  • gwtoolsetGWTFileBackendCleanupJob
  • LocalPageMoveJob
  • LocalRenameUserJob
  • LoginNotifyChecks
  • MassMessageJob
  • MassMessageSubmitJob
  • MassMessageServerSideJob
  • MessageGroupStatesUpdaterJob
  • MessageIndexRebuildJob
  • PublishStashedFile
  • GlobalUserPageLocalJobSubmitJob
  • renameUser
  • UpdateRepoOnDelete
  • UpdateRepoOnMove
  • EchoNotificationJob
  • cirrusSearchMassIndex
  • sendMail
  • deletePage
  • refreshLinksDynamic
  • LocalSharedHelpPageCacheUpdateJob
  • cirrusSearchJobChecker
  • constraintsTableUpdate
  • synchroniseThreadArticleData
  • compileArticleMetadata
  • clearUserWatchlist
  • BounceHandlerNotificationJob
  • MessageGroupStatsRebuildJob
  • cpjobqueue.error
  • gwtoolsetUploadMetadataJob
  • CognateLocalJobSubmitJob
  • TTMServerMessageUpdateJob
  • LocalRenameUserJob
  • userGroupExpiry
  • crosswikiSuppressUser
  • securePollPopulateVoterList
  • CentralAuthUnattachUserJob
  • LocalGlobalUserPageCacheUpdateJob

[x]globalUsageCachePurge

  • activityUpdateJob

Notes:

mediawiki/includes/jobqueue/jobs/

1 LoginNotifyChecks
2 RecordLintJob
3 EchoNotificationDeleteJob
8 cirrusSearchIncomingLinkCount
8 enotifNotify
10 activityUpdateJob
58 cirrusSearchLinksUpdatePrioritized
63 recentChangesUpdate
64 refreshLinks
72 categoryMembershipChange
81 cirrusSearchLinksUpdate
165 htmlCacheUpdate
2000 cirrusSearchCheckerJob

Graphs:

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+8 -69
mediawiki/services/change-propagation/jobqueue-deploymaster+8 -51
operations/puppetproduction+2 -13
operations/puppetproduction+12 -0
operations/puppetproduction+40 -3
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+0 -20
mediawiki/services/change-propagation/jobqueue-deploymaster+8 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -2
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+4 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+7 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+3 -1
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+46 -8
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 502840 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

@Pchelolo I think it may be better to wait with actual switching of prod jobs until T219279 and T218005 are resolved, given that unlike web requests, a job doesn't offer a way with retrying when they fail. The job being queued is kind of promise for us to run it eventually, and given the fatal nature of these errors that's hard to fulfil.

Working on the logic for it is fine of course. Just the actual switch may be a bit too soon. Have we switched jobs in Beta already?

@Krinkle yeah we will wait for sure, meanwhile, we are exploring:)

A job doesn't offer a way with retrying when they fail.

Actually, it does. We do retry jobs unless it explicitly prohibits retries. As a preparation step, we can actually make retries remove the PHP7 cookie. That way if job has fatalled on PHP7 it will be retried with HHVM.

However, I agree that enabling jobs in production might be premature, we can probably start experimenting in beta cluster. However, we'd need to resolve T215339 ASAP

Change 508599 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

Change 502840 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Allow enabling PHP7 per job for low_traffic jobs.

https://gerrit.wikimedia.org/r/502840

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:04:59Z] <mobrovac@deploy1001> Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-08T11:06:29Z] <mobrovac@deploy1001> Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)

Change 508599 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Enable PHP 7 for all jobs in beta cluster.

https://gerrit.wikimedia.org/r/508599

It looks like deployment-prep has an older php7.2 version than production, which is something we should fix as well

We have upgraded php7 on beta, so now it looks like async jobs are running. We will leave it as is until next week, where we will assess if it worked out.

Change 510703 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Change 510703 merged by Mobrovac:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch updateBetaFeaturesUserCounts job to PHP 7.

https://gerrit.wikimedia.org/r/510703

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:26:33Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-16T10:27:40Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)

Change 511414 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob job to PHP7

https://gerrit.wikimedia.org/r/511414

Change 511436 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Change 511436 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecordLintJob to php7.

https://gerrit.wikimedia.org/r/511436

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:42:32Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-20T14:43:28Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)

Change 511649 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Change 511414 abandoned by Effie Mouzeli:
Switch RecordLintJob job to PHP7

Reason:
Abandoned for 511436

https://gerrit.wikimedia.org/r/511414

Change 511649 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch htmlCacheUpdate to PHP7.

https://gerrit.wikimedia.org/r/511649

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:07:10Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-21T12:08:04Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)

Change 511913 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913

Change 511913 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch wikibase-addUsagesForPage to PHP7.

https://gerrit.wikimedia.org/r/511913

Mentioned in SAL (#wikimedia-operations) [2019-05-27T09:58:20Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-27T09:59:29Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148 (duration: 01m 09s)

Change 512858 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch ORESFetchScoresJob to PHP7.

https://gerrit.wikimedia.org/r/512858

Change 512858 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch ORESFetchScoresJob to PHP7.

https://gerrit.wikimedia.org/r/512858

Mentioned in SAL (#wikimedia-operations) [2019-05-28T08:52:45Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-05-28T08:54:06Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148 (duration: 01m 21s)

Change 512872 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecentChangesUpdate to PHP7

https://gerrit.wikimedia.org/r/512872

Change 512872 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch RecentChangesUpdate to PHP7

https://gerrit.wikimedia.org/r/512872

Mentioned in SAL (#wikimedia-operations) [2019-06-25T17:49:47Z] <jiji@deploy1001> Started deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148

Mentioned in SAL (#wikimedia-operations) [2019-06-25T17:51:24Z] <jiji@deploy1001> Finished deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148 (duration: 01m 37s)

Change 521290 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch refreshLinks to PHP7.

https://gerrit.wikimedia.org/r/521290

Change 521290 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch refreshLinks to PHP7.

https://gerrit.wikimedia.org/r/521290

Change 521501 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch cirrusSearchCheckerJob to PHP7.

https://gerrit.wikimedia.org/r/521501

Change 521501 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch cirrus* jobs to PHP7.

https://gerrit.wikimedia.org/r/521501

Change 521880 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Switch more high traffic jobs to PHP7.

https://gerrit.wikimedia.org/r/521880

Change 521880 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie for all hightraffic jobs to PHP7.

https://gerrit.wikimedia.org/r/521880

Change 522408 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie to videoscaler jobs

https://gerrit.wikimedia.org/r/522408

Change 522472 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: jobrunners: Enable php7_only feature flags

https://gerrit.wikimedia.org/r/522472

Change 522511 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[mediawiki/services/change-propagation/jobqueue-deploy@master] WIP: Revert all changes for switching jobs to PHP7

https://gerrit.wikimedia.org/r/522511

Change 522408 merged by Effie Mouzeli:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add PHP7 cookie to videoscaling jobs

https://gerrit.wikimedia.org/r/522408

After discussing with @Pchelolo, we believe that in order to migrate the rest, we could migrate ~25% of jobrunners in eqiad (6 servers) to serve only via PHP7. With this scenario, if we have jobs that are failing, they have a good chance of the job eventually running, since it will be retried a couple of times.

Change 524336 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] profile::mediawiki::jobrunner: Configure php7_only flag

https://gerrit.wikimedia.org/r/524336

Mentioned in SAL (#wikimedia-operations) [2019-07-22T10:23:44Z] <jijiki> Disable puppet on jobrunners for 524336 - T219148

Change 524336 merged by Effie Mouzeli:
[operations/puppet@production] profile::mediawiki::jobrunner: Configure php7_only flag

https://gerrit.wikimedia.org/r/524336

Change 522472 merged by Effie Mouzeli:
[operations/puppet@production] jobrunners: Test php7_only on 6 jobrunners

https://gerrit.wikimedia.org/r/522472

Mentioned in SAL (#wikimedia-operations) [2019-07-22T15:49:51Z] <jijiki> Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - T219148

Change 525306 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] jobrunners: Convert all jobrunners to server PHP7 only

https://gerrit.wikimedia.org/r/525306

Change 525306 merged by Effie Mouzeli:
[operations/puppet@production] jobrunners: Migrate all jobrunners to serve only via PHP7

https://gerrit.wikimedia.org/r/525306

All async jobs run on PHP7, we will keep an eye for about a week, and then cleanup code leftovers

  • change-propagation snippets and files
  • puppet - Remove php7_only feature flag (and make it default)

We will decide the following weeks if we are going to completely remove HHVM from jobrunners or not

Change 522511 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Revert all changes for switching jobs to PHP7

https://gerrit.wikimedia.org/r/522511

Change 526132 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP: jobrunners: Make jobrunners PHP7 only by default

https://gerrit.wikimedia.org/r/526132

Removing HHVM and any leftovers are now part of T229792, we mark this as resolved ๐Ÿ’ƒ

jijiki claimed this task.

Change 526132 abandoned by Effie Mouzeli:
WIP: jobrunners: Make jobrunners PHP7 only by default

Reason:
needs rewrite

https://gerrit.wikimedia.org/r/526132