Page MenuHomePhabricator

Turn up PHP 8.1-flavored k8s deployments for all MediaWiki services
Closed, ResolvedPublic

Description

Specifically, there are two parts to this:

mw-web and mw-api-ext services: next releases

The two large external-facing services, mw-web and mw-api-ext, will need directly addressable (i.e., dedicated LVS services, discovery addresses, etc.) deployments in the same manner as the "next" release we've created for mw-debug in T372604.

This is to support cookie-driven migration of external traffic to 8.1. Ideally, we should batch these together in order to minimize the number of complex / disruptive operations (e.g., LVS service turnups).

All large services: migration releases

All "large" services will need an additional "migration" deployment that is routed via the existing "main" release, similar to how the "canary" releases work.

This is to support a capacity-based fractional migration of internal and cookie-less external traffic (i.e., progressively shifting replica counts from the "main" release to the "migration" release, which route via the same Endpoints object). These "migration" releases will (and indeed must) be initially scaled to zero replicas.

Services in scope: mw-api-int, mw-api-ext, mw-jobrunner, mw-parsoid, mw-web

Other lower-traffic services (e.g., mw-misc, mw-wikifunctions) or non-traffic-serving MediaWiki use cases (e.g., mw-script) are not in scope, and will be migrated as one-offs.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+4 -4
operations/deployment-chartsmaster+2 -2
operations/cookbooksmaster+4 -0
operations/deployment-chartsmaster+6 -0
operations/puppetproduction+4 -2
operations/deployment-chartsmaster+15 -0
operations/puppetproduction+10 -5
operations/deployment-chartsmaster+4 -12
operations/puppetproduction+4 -0
operations/deployment-chartsmaster+104 -16
operations/deployment-chartsmaster+1 -3
operations/puppetproduction+1 -0
operations/deployment-chartsmaster+26 -4
operations/deployment-chartsmaster+7 -3
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+5 -2
operations/dnsmaster+8 -0
operations/dnsmaster+8 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/dnsmaster+8 -4
operations/puppetproduction+83 -1
operations/deployment-chartsmaster+2 -6
operations/puppetproduction+2 -0
operations/deployment-chartsmaster+30 -8
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedReedy
StalledNone
OpenNone
OpenNone
OpenNone
ResolvedReedy
ResolvedKrinkle
ResolvedKrinkle
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedLucas_Werkmeister_WMDE
ResolvedNone
ResolvedJdforrester-WMF
ResolvedDaimona
ResolvedJdforrester-WMF
DeclinedNone
ResolvedScott_French
ResolvedScott_French
ResolvedScott_French
Resolvedcscott
ResolvedScott_French
DuplicatePRODUCTION ERRORNone
ResolvedPRODUCTION ERRORMichael
ResolvedPRODUCTION ERRORMichael
ResolvedMichael
DuplicatePRODUCTION ERRORNone
ResolvedTgr
ResolvedNone
ResolvedDAlangi_WMF
ResolvedTgr
ResolvedDAlangi_WMF
ResolvedTgr
ResolvedTgr
ResolvedAtieno
OpenNone
Resolvedbrouberol
ResolvedScott_French
ResolvedScott_French
ResolvedScott_French
ResolvedScott_French
ResolvedScott_French
ResolvedScott_French
ResolvedKrinkle
ResolvedKrinkle
ResolvedScott_French
ResolvedKrinkle
ResolvedTgr
ResolvedScott_French
Resolvedjnuche
ResolvedJdforrester-WMF
ResolvedBUG REPORTbd808
ResolvedReedy
ResolvedReedy
Resolvedseanleong-WMDE
StalledNone
OpenNone
ResolvedLucas_Werkmeister_WMDE
ResolvedDaimona
ResolvedDaimona
ResolvedDaimona
OpenNone
ResolvedUmherirrender
OpenNone
ResolvedUmherirrender
ResolvedUmherirrender
Resolved mszabo
Resolvedtstarling
ResolvedUmherirrender
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedPhysikerwelt
ResolvedTgr
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedNone
ResolvedUmherirrender
ResolvedNone
ResolvedNone
ResolvedkarapayneWMDE
ResolvedAudreyPenven_WMDE
ResolvedAudreyPenven_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedUmherirrender
Resolvedthiemowmde
ResolvedLucas_Werkmeister_WMDE
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
Resolved mszabo
ResolvedxSavitar
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
OpenNone
OpenNone
OpenNone
OpenDannyS712
ResolvedUmherirrender
Resolved larissagaulia
ResolvedUmherirrender
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedKrinkle
ResolvedScott_French
ResolvedScott_French
ResolvedScott_French
Resolveddduvall
ResolvedScott_French
ResolvedScott_French
ResolvedKrinkle

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1081452 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mw-api-int: remove "migration" release values overrides

https://gerrit.wikimedia.org/r/1081452

Change #1082050 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] service: move mw-api-ext-next to lvs_setup

https://gerrit.wikimedia.org/r/1082050

Change #1082051 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] service: move mw-api-ext-next to production

https://gerrit.wikimedia.org/r/1082051

Mentioned in SAL (#wikimedia-operations) [2024-10-21T17:59:02Z] <swfrench-wmf> ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040

Change #1082050 merged by Scott French:

[operations/puppet@production] service: move mw-api-ext-next to lvs_setup

https://gerrit.wikimedia.org/r/1082050

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:04:42Z] <swfrench-wmf> ran and enabled pupppet agent on 'A:lvs and A:eqiad' - T377040

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:05:15Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:06:02Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:09:53Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:15:41Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:19:32Z] <swfrench-wmf> ran and enabled pupppet agent on 'A:lvs and A:codfw' - T377040

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:19:57Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:20:35Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:22:42Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:23:28Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:32:05Z] <swfrench-wmf> ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040

Change #1080789 merged by Scott French:

[operations/puppet@production] service: move mw-web-next to lvs_setup

https://gerrit.wikimedia.org/r/1080789

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:36:11Z] <swfrench-wmf> ran and enabled puppet agent on 'A:lvs and A:eqiad' - T377040

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:37:11Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:43:11Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:52:27Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T18:57:59Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:01:46Z] <swfrench-wmf> ran and enabled puppet agent on 'A:lvs and A:codfw' - T377040

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:02:18Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:02:49Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:04:48Z] <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:10:47Z] <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:30:13Z] <swfrench@cumin2002> conftool action : set/pooled=true; selector: dnsdisc=mw-web-next,name=codfw [reason: preparing mw-web-next (a/p) for discovery - T377040]

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:31:06Z] <swfrench@cumin2002> conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next,name=codfw [reason: preparing mw-api-ext-next (a/p) for discovery - T377040]

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:35:40Z] <swfrench@cumin2002> conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=codfw [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:36:06Z] <swfrench@cumin2002> conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=eqiad [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:36:49Z] <swfrench@cumin2002> conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=codfw [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]

Mentioned in SAL (#wikimedia-operations) [2024-10-21T19:36:58Z] <swfrench@cumin2002> conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=eqiad [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]

Change #1080790 merged by Scott French:

[operations/puppet@production] service: move mw-web-next to production

https://gerrit.wikimedia.org/r/1080790

Change #1082051 merged by Scott French:

[operations/puppet@production] service: move mw-api-ext-next to production

https://gerrit.wikimedia.org/r/1082051

Change #1080779 merged by Scott French:

[operations/dns@master] wmnet: add DYNA records for mw-(web|api-ext)-next

https://gerrit.wikimedia.org/r/1080779

Change #1082090 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/dns@master] Revert^2 "wmnet: add DYNA records for mw-(web|api-ext)-next"

https://gerrit.wikimedia.org/r/1082090

Change #1082090 merged by Scott French:

[operations/dns@master] Revert^2 "wmnet: add DYNA records for mw-(web|api-ext)-next"

https://gerrit.wikimedia.org/r/1082090

Mentioned in SAL (#wikimedia-operations) [2024-10-21T21:16:33Z] <swfrench-wmf> ran authdns-update to pick up mw-(web|api-ext)-next discovery records - T377040

Aside from eventually enabling paging and httpbb checks, the mw-web-next and mw-api-ext-next services are up, along with all supporting bits (LVS services, discovery addresses, etc.).

Next up is the (far less involved) turn-up of the "migration" releases of all services - e.g., what https://gerrit.wikimedia.org/r/1081450 does for mw-api-int.

jijiki triaged this task as Medium priority.Oct 23 2024, 12:07 PM
jijiki moved this task from Incoming 🐫 to Doing 😎 on the serviceops board.

Change #1082863 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mediawiki: add remaining migration releases

https://gerrit.wikimedia.org/r/1082863

Change #1082864 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mediawiki: remove migration release overrides

https://gerrit.wikimedia.org/r/1082864

Change #1082865 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: add remaining "migration" releases

https://gerrit.wikimedia.org/r/1082865

Change #1071957 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mediawiki: parameterize PHP version via chart value

https://gerrit.wikimedia.org/r/1071957

Change #1071957 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: parameterize PHP version via chart value

https://gerrit.wikimedia.org/r/1071957

Mentioned in SAL (#wikimedia-operations) [2024-10-31T17:11:08Z] <swfrench@deploy2002> Started scap sync-world: Deployment to pick up PHP version parameterization - T372604 T377040

Mentioned in SAL (#wikimedia-operations) [2024-10-31T17:13:00Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to pick up PHP version parameterization - T372604 T377040 (duration: 01m 52s)

Change #1085491 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mediawiki: ensure default php.version is a string

https://gerrit.wikimedia.org/r/1085491

Change #1085491 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: ensure default php.version is a string

https://gerrit.wikimedia.org/r/1085491

Mentioned in SAL (#wikimedia-operations) [2024-10-31T23:35:38Z] <swfrench@deploy2002> Started scap sync-world: Deployment to clear noop chart diff from 1085491 - T372604 T377040

Mentioned in SAL (#wikimedia-operations) [2024-10-31T23:37:28Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to clear noop chart diff from 1085491 - T372604 T377040 (duration: 01m 49s)

Change #1081449 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: support for service.deployment: none

https://gerrit.wikimedia.org/r/1081449

Mentioned in SAL (#wikimedia-operations) [2024-12-04T18:11:24Z] <swfrench@deploy2002> Started scap sync-world: Deployment to clear noop chart diff from rANRE108144973f52 - T377040

Mentioned in SAL (#wikimedia-operations) [2024-12-04T18:13:31Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to clear noop chart diff from rANRE108144973f52 - T377040 (duration: 02m 07s)

Change #1100555 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mw-(web|api-ext)-next: php.version to 8.1

https://gerrit.wikimedia.org/r/1100555

Change #1100556 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: switch mw-(web|api-ext)-next to 8.1

https://gerrit.wikimedia.org/r/1100556

Change #1101121 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mw-(apt-ext|api-int|jobrunner|parsoid|web): set php.version to 8.1

https://gerrit.wikimedia.org/r/1101121

Change #1101122 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: switch all "migration" releases to 8.1

https://gerrit.wikimedia.org/r/1101122

Change #1101124 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] service::catalog: enable monitoring for mw-*-next

https://gerrit.wikimedia.org/r/1101124

Change #1081450 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: add migration release

https://gerrit.wikimedia.org/r/1081450

Change #1081451 merged by Scott French:

[operations/puppet@production] hieradata: add "migration" release of mw-api-int

https://gerrit.wikimedia.org/r/1081451

Mentioned in SAL (#wikimedia-operations) [2024-12-12T19:40:18Z] <swfrench@deploy2002> Started scap sync-world: Deployment to populate mw-api-int migration release files - T377040

Mentioned in SAL (#wikimedia-operations) [2024-12-12T19:42:32Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to populate mw-api-int migration release files - T377040 (duration: 02m 13s)

Change #1081452 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: remove "migration" release values overrides

https://gerrit.wikimedia.org/r/1081452

Change #1082863 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: add remaining migration releases

https://gerrit.wikimedia.org/r/1082863

Change #1082865 merged by Scott French:

[operations/puppet@production] hieradata: add remaining "migration" releases

https://gerrit.wikimedia.org/r/1082865

Mentioned in SAL (#wikimedia-operations) [2024-12-17T18:57:46Z] <swfrench@deploy2002> Started scap sync-world: Deployment to populate remaining migration release files - T377040

Mentioned in SAL (#wikimedia-operations) [2024-12-17T19:09:22Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to populate remaining migration release files - T377040 (duration: 11m 35s)

Change #1082864 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: remove migration release overrides

https://gerrit.wikimedia.org/r/1082864

That's everything we can do at this point, until we're ready to switch the "next" and "migration" deployments over to 8.1 and start the traffic ramp. The latter part won't happy until mid-January 2025.

Change #1101122 merged by Scott French:

[operations/puppet@production] hieradata: switch all "migration" releases to 8.1

https://gerrit.wikimedia.org/r/1101122

Change #1101121 merged by jenkins-bot:

[operations/deployment-charts@master] mw-(api-ext|api-int|jobrunner|parsoid|web): migration php.version to 8.1

https://gerrit.wikimedia.org/r/1101121

Mentioned in SAL (#wikimedia-operations) [2025-01-08T18:19:28Z] <swfrench@deploy2002> Started scap sync-world: Deployment to switch migration release files to 8.1 - T377040

Mentioned in SAL (#wikimedia-operations) [2025-01-08T18:33:26Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to switch migration release files to 8.1 - T377040 (duration: 13m 57s)

Change #1100556 merged by Scott French:

[operations/puppet@production] hieradata: switch mw-(web|api-ext)-next to 8.1

https://gerrit.wikimedia.org/r/1100556

Change #1100555 merged by jenkins-bot:

[operations/deployment-charts@master] mw-(web|api-ext)-next: php.version to 8.1

https://gerrit.wikimedia.org/r/1100555

Mentioned in SAL (#wikimedia-operations) [2025-01-16T18:24:10Z] <swfrench@deploy2002> Started scap sync-world: Deployment to switch next release files to 8.1 - T377040

Mentioned in SAL (#wikimedia-operations) [2025-01-16T18:28:00Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to switch next release files to 8.1 - T377040 (duration: 03m 50s)

Change #1112078 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mw-(web|api-ext)-next: bump replicas and update TODO

https://gerrit.wikimedia.org/r/1112078

Alright, there are two remaining prep changes to deploy before we can close this out and complete setup for T377042 (in turn making the -next deployments externally routable, though in practice they should receive no traffic until cookie-based enrollment starts):

  1. https://gerrit.wikimedia.org/r/1112078 - mw-(web|api-ext)-next: bump replicas and update TODO
  2. https://gerrit.wikimedia.org/r/1101124 - service::catalog: enable monitoring for mw-(web|api-ext)-next

Change #1112246 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/cookbooks@master] sre.switchdc.mediawiki: add -next services

https://gerrit.wikimedia.org/r/1112246

Change #1112246 merged by jenkins-bot:

[operations/cookbooks@master] sre.switchdc.mediawiki: add -next services

https://gerrit.wikimedia.org/r/1112246

Change #1112078 merged by jenkins-bot:

[operations/deployment-charts@master] mw-(web|api-ext)-next: bump replicas and update TODO

https://gerrit.wikimedia.org/r/1112078

Change #1101124 merged by Effie Mouzeli:

[operations/puppet@production] service::catalog: enable monitoring for mw-(web|api-ext)-next

https://gerrit.wikimedia.org/r/1101124

Many thanks to @jijiki for merging those last two patches.

Next steps will come in T383845 (pre-ramp capacity augments, etc.).