Page MenuHomePhabricator

Update wikikube eqiad to k8s 1.23
Closed, ResolvedPublic

Description

This is scheduled for March 7th - 09:00-16:00 UTC (actual downtime of the cluster should be smaller than this window). Hopefully it will not overlap with T329073 which is being done as it is the last day that eqiad will be fully depooled. Some hosts that will be affected by the task above and should be done before 14:00 UTC:

  • kubetcd1005
  • kubemaster1001
  • kubernetes[1005,1007-1008,1017-1018]

Todos:

Services won't be repooled after the reimaging as the eqiad datacenter is anyway depooled due to the switchover.

Detailed steps and commands can be found in T326340: Update staging-codfw to k8s 1.23

@akosiaris will be running point.

Issues

None yet

Impact

The only user visible impact will be for Toolhub, which per T329319#8619246 and related discussions is expected and acceptable. Toolhub will be prioritized when deploying new services to the cluster. Adding @bd808 for their convenience.

Event Timeline

Change 894586 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] wikikube eqiad: Update cluster settings for k8s 1.23

https://gerrit.wikimedia.org/r/894586

Change 894591 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] admin_ng: Update wikikube-eqiad settings to k8s 1.23

https://gerrit.wikimedia.org/r/894591

Adding @Ottomata too in case we have the same issue as T329664#8638499

Icinga downtime and Alertmanager silence (ID=6354fd03-fd3c-49db-ac21-75e88be10633) set by akosiaris@cumin1001 for 1 day, 0:00:00 on 23 host(s) and their services with reason: Reinitialize eqiad with k8s 1.23

kubemaster[1001-1002].eqiad.wmnet,kubernetes[1005-1022].eqiad.wmnet,kubetcd[1004-1006].eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:51:31Z] <akosiaris> T331126 Scheduled 24H downtime for all wikikube eqiad hosts and all LVS services powered by the cluster

Change 894586 merged by Alexandros Kosiaris:

[operations/puppet@production] wikikube eqiad: Update cluster settings for k8s 1.23

https://gerrit.wikimedia.org/r/894586

Change 894591 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Update wikikube-eqiad settings to k8s 1.23

https://gerrit.wikimedia.org/r/894591

Change 895229 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] wikikube eqiad: Fix ippool to 10.67.128.0/18

https://gerrit.wikimedia.org/r/895229

Change 895229 merged by Alexandros Kosiaris:

[operations/deployment-charts@master] wikikube eqiad: Fix ippool to 10.67.128.0/18

https://gerrit.wikimedia.org/r/895229

Mentioned in SAL (#wikimedia-operations) [2023-03-07T14:45:17Z] <akosiaris> uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T331126

wdqs was repooled yesterday, only things left are some old IP ranges cleanups and adding the 2 new nodes in the cluster.

akosiaris updated the task description. (Show Details)

All tasks done. Resolving