⚓ T362323 Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons)

Subject	Repo	Branch	Lines +/-
kubernetes: make 5 eqiad api appservers k8s workers	operations/puppet	production	+20 -15
mw-web, mw-api-ext: bump replicas in advance of traffic shift	operations/deployment-charts	master	+3 -3
trafficserver: move k8s traffic shift to 90%	operations/puppet	production	+1 -1
trafficserver: move 85% of traffic to mw-on-k8s	operations/puppet	production	+1 -1
mw-web, mw-api-ext: bump replicas	operations/deployment-charts	master	+2 -2
scap: make mw1407 a scap proxy	operations/puppet	production	+1 -1
k8s: move 5 eqiad appservers to kubernetes	operations/puppet	production	+21 -16
trafficserver: move 80% of traffic to mw on k8s	operations/puppet	production	+1 -1
mw-web, mw-api-ext: Raise replicas for 80% traffic	operations/deployment-charts	master	+3 -3
kubernetes: move 5 eqiad appservers to kubernetes	operations/puppet	production	+16 -11
trafficserver: move 75% of traffic to mw on k8s	operations/puppet	production	+1 -1
mw-web, mw-api-ext: Raise replicas for 75% traffic	operations/deployment-charts	master	+2 -2

Status	Assigned	Task
Stalled	None	T255792 Quibble runs core:unit tests twice!
Open	None	T328919 Upgrade to PHPUnit 10
Open	None	T338103 Micro-optimize ApiResult::isMetadataKey with str_starts_with once we support PHP8+
Open	None	T328921 Drop PHP 7.4 support from MediaWiki
Stalled	None	T334726 Use return type `never` in Wikibase
Open	None	T328922 Drop PHP 8.0 support from MediaWiki
Stalled	None	T319055 Upgrade to psr/container 2.x
Stalled	Krinkle	T319432 Migrate WMF production from PHP 7.4 to PHP 8.1
Open	None	T291916 Tracking task for Bullseye migrations in production
Stalled	None	T356293 Migrate MW appservers' base images to bullseye
Open	None	T290536 Serve production traffic via Kubernetes
Open	Clement_Goubert	T362323 Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons)

Clement_Goubert triaged this task as High priority.Thu, Apr 11, 12:14 PM

Clement_Goubert created this task.

hnowlan awarded a token.Thu, Apr 11, 12:16 PM

Ladsgroup awarded a token.Thu, Apr 11, 12:22 PM

jijiki awarded a token.Thu, Apr 11, 12:23 PM

Clement_Goubert mentioned this in T290536: Serve production traffic via Kubernetes.Thu, Apr 11, 12:30 PM

taavi awarded a token.Thu, Apr 11, 12:33 PM

Clement_Goubert updated the task description. (Show Details)Thu, Apr 11, 12:46 PM

Clement_Goubert updated the task description. (Show Details)Thu, Apr 11, 12:49 PM

jijiki updated the task description. (Show Details)Tue, Apr 16, 9:31 AM

jijiki added a project: MoveComms-Support.Tue, Apr 16, 9:34 AM

jijiki added a subscriber: Trizek-WMF.

Change #1021904 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mw-web, mw-api-ext: Raise replicas for 75% traffic

https://gerrit.wikimedia.org/r/1021904

gerritbot added a project: Patch-For-Review.Fri, Apr 19, 11:59 AM

Change #1021905 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] trafficserver: move 75% of traffic to mw on k8s

https://gerrit.wikimedia.org/r/1021905

Change #1021904 merged by jenkins-bot:

[operations/deployment-charts@master] mw-web, mw-api-ext: Raise replicas for 75% traffic

https://gerrit.wikimedia.org/r/1021904

Change #1021905 merged by Clément Goubert:

[operations/puppet@production] trafficserver: move 75% of traffic to mw on k8s

https://gerrit.wikimedia.org/r/1021905

Clement_Goubert updated the task description. (Show Details)Tue, Apr 23, 9:51 AM

Change #1023397 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] kubernetes: move 5 eqiad appservers to kubernetes

https://gerrit.wikimedia.org/r/1023397

Change #1023397 merged by Clément Goubert:

[operations/puppet@production] kubernetes: move 5 eqiad appservers to kubernetes

https://gerrit.wikimedia.org/r/1023397

Change #1023412 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mw-web, mw-api-ext: Raise replicas for 80% traffic

https://gerrit.wikimedia.org/r/1023412

Change #1023413 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] trafficserver: move 80% of traffic to mw on k8s

https://gerrit.wikimedia.org/r/1023413

Checking after MoveComms-Support was added to this task: what kind of support do you need, if any?

Change #1023412 merged by jenkins-bot:

[operations/deployment-charts@master] mw-web, mw-api-ext: Raise replicas for 80% traffic

https://gerrit.wikimedia.org/r/1023412

Change #1023413 merged by Clément Goubert:

[operations/puppet@production] trafficserver: move 80% of traffic to mw on k8s

https://gerrit.wikimedia.org/r/1023413

Clement_Goubert updated the task description. (Show Details)Wed, Apr 24, 9:15 AM

Mentioned in SAL (#wikimedia-operations) [2024-04-24T09:29:39Z] <claime> 80% of external traffix to mw-on-k8s - T362323

hnowlan subscribed.Fri, Apr 26, 4:00 PM

Maintenance_bot removed a project: Patch-For-Review.Fri, Apr 26, 6:31 PM

Change #1026159 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] trafficserver: move 80% of traffic to mw-on-k8s

https://gerrit.wikimedia.org/r/1026159

Change #1026160 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] mw-we, mw-api-ext: bump replicas

https://gerrit.wikimedia.org/r/1026160

Change #1026158 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] k8s: move 5 eqiad appservers to kubernetes

https://gerrit.wikimedia.org/r/1026158

Change #1026158 merged by Hnowlan:

[operations/puppet@production] k8s: move 5 eqiad appservers to kubernetes

https://gerrit.wikimedia.org/r/1026158

Change #1026520 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] scap: make mw1407 a scap proxy

https://gerrit.wikimedia.org/r/1026520

Change #1026520 merged by Hnowlan:

[operations/puppet@production] scap: make mw1407 a scap proxy

https://gerrit.wikimedia.org/r/1026520

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1371.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1409.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1435.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1399.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1405.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1371.eqiad.wmnet with OS bullseye completed:

mw1371 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405021132_hnowlan_3953702_mw1371.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1409.eqiad.wmnet with OS bullseye completed:

mw1409 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405021135_hnowlan_3953708_mw1409.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1405.eqiad.wmnet with OS bullseye completed:

mw1405 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405021137_hnowlan_3953738_mw1405.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1435.eqiad.wmnet with OS bullseye completed:

mw1435 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405021139_hnowlan_3953714_mw1435.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1399.eqiad.wmnet with OS bullseye completed:

mw1399 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405021143_hnowlan_3953743_mw1399.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB

Change #1026160 merged by jenkins-bot:

[operations/deployment-charts@master] mw-web, mw-api-ext: bump replicas

https://gerrit.wikimedia.org/r/1026160

Change #1026159 merged by Hnowlan:

[operations/puppet@production] trafficserver: move 85% of traffic to mw-on-k8s

https://gerrit.wikimedia.org/r/1026159

Maintenance_bot removed a project: Patch-For-Review.Thu, May 2, 3:30 PM

Change #1028840 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] kubernetes: make 5 eqiad api appservers k8s workers

https://gerrit.wikimedia.org/r/1028840

gerritbot added a project: Patch-For-Review.Tue, May 7, 2:13 PM

Change #1028842 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] mw-web, mw-api-ext: bump replicas in advance of traffic shift

https://gerrit.wikimedia.org/r/1028842

Change #1028844 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] trafficserver: move k8s traffic shift to 90%

https://gerrit.wikimedia.org/r/1028844

Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons)
Open, HighPublic
Actions

Description

This is (almost) the final step!

What?

What we are not migrating to Wikikube yet

Progression

Notes

Details

Related Objects
Search...

Event Timeline

Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons)Open, HighPublicActions