Page MenuHomePhabricator

Move kubernetes workers to bullseye and docker to overlayfs
Closed, ResolvedPublic

Description

All kubernetes worker nodes should be running bullseye and docker (docker.io=20.10.5+dfsg1-1+deb11u1 from debian upstream) with overlayfs (instead of devicemapper) as storage driver:

  • Kubernetes master nodes already run docker (docker.io=18.09.1+dfsg1-7.1+deb10u3) with overlayfs (on buster)
  • Changes have been prepared and tested in ml as well as staging clusters

Todos:

  • Reimage ml-codfw workers
  • Reimage ml-eqiad workers
  • Reimage staging-codfw workers
  • Reimage staging-eqiad workers
  • Add kubernetes20[1(89)|2(012)] as updated nodes to wikikube-codfw - T302208
  • cordon kubernetes200[1-4]
  • Add kubernetes10[18-22] as updated nodes to wikikube-eqiad - T293728
  • cordon kubernetes100[1-4]
  • Reimage wikikube-codfw vms kubernetes20[05,06,15,16]
  • Reimage wikikube-codfw nodes kubernetes20[07-14,17]
  • Reimage wikikube-eqiad vms kubernetes10[05,06,15,16]
  • Reimage wikikube-eqiad nodes kubernetes10[07-14,17]
  • Decom kubernetes100[1-4] (they have been refreshed, T297157) - T303045
  • Decom kubernetes200[1-4] (they have been refreshed, T286585) - T303044

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+48 -0
operations/puppetproduction+54 -343
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+1 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+5 -5
operations/puppetproduction+70 -7
operations/puppetproduction+7 -0
operations/puppetproduction+2 -2
operations/puppetproduction+8 -1
operations/puppetproduction+1 -42
operations/puppetproduction+9 -1
operations/puppetproduction+28 -0
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -1
operations/puppetproduction+8 -0
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+1 -2
operations/puppetproduction+1 -0
operations/puppetproduction+2 -1
operations/puppetproduction+6 -0
operations/puppetproduction+1 -0
operations/puppetproduction+12 -0
operations/homer/publicmaster+2 -0
operations/puppetproduction+17 -6
operations/puppetproduction+7 -1
operations/puppetproduction+13 -0
operations/puppetproduction+1 -1
operations/puppetproduction+7 -2
operations/puppetproduction+4 -2
operations/debs/rsyslogdebian/bullseye-wikimedia-k8s+47 -0
operations/debs/rsyslogdebian/bullseye-wikimedia-k8s+85 -26
operations/puppetproduction+74 -23
operations/puppetproduction+2 -0
operations/homer/publicmaster+1 -0
operations/puppetproduction+3 -1
operations/puppetproduction+9 -3
operations/puppetproduction+6 -2
operations/puppetproduction+3 -1
operations/puppetproduction+1 -0
operations/puppetproduction+62 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 769085 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set Bullseye + overlayfs settings for kubernetes2005

https://gerrit.wikimedia.org/r/769085

Change 769085 merged by Elukey:

[operations/puppet@production] Set Bullseye + overlayfs settings for kubernetes2007

https://gerrit.wikimedia.org/r/769085

Change 769463 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add bullseye + overlayfs settings to kubernetes2008

https://gerrit.wikimedia.org/r/769463

Change 769463 merged by Elukey:

[operations/puppet@production] Add bullseye + overlayfs settings to kubernetes2008

https://gerrit.wikimedia.org/r/769463

Change 769616 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2009

https://gerrit.wikimedia.org/r/769616

Change 769617 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2010

https://gerrit.wikimedia.org/r/769617

Change 769616 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2009

https://gerrit.wikimedia.org/r/769616

Change 769617 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2010

https://gerrit.wikimedia.org/r/769617

Change 769919 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2011

https://gerrit.wikimedia.org/r/769919

Change 769920 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2012

https://gerrit.wikimedia.org/r/769920

Change 769919 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2011

https://gerrit.wikimedia.org/r/769919

Change 769920 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2012

https://gerrit.wikimedia.org/r/769920

Change 769978 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2013

https://gerrit.wikimedia.org/r/769978

Change 769979 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2014

https://gerrit.wikimedia.org/r/769979

Change 769978 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2013

https://gerrit.wikimedia.org/r/769978

Change 769979 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes2014

https://gerrit.wikimedia.org/r/769979

Change 770439 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye and overlayfs for kubernetes2017

https://gerrit.wikimedia.org/r/770439

Change 770440 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1007

https://gerrit.wikimedia.org/r/770440

Change 770439 merged by Elukey:

[operations/puppet@production] Set bullseye and overlayfs for kubernetes2017

https://gerrit.wikimedia.org/r/770439

There are some ganeti VMs running as kubernets nodes in both clusters, with two virtual disks: vda for the boot and root partitions, and vdb as unmanaged for device mapper (used by Docker). Since we are moving to Overlay, the plan for each VM would be the following (using kubernetes2005 as example):

Assumption: Ganeti vdisks are 0 indexed, so vda == 0 vdb == 1

  • Drain kubernetes2005 via kubectl
  • add downtime
  • shutdown the instance gnt-instance shutdown kubernetes2005.codfw.wmnet
  • remove /dev/vdb gnt-instance modify --disk 1:remove kubernetes2005.codfw.wmnet
  • add 10g to vda gnt-instance grow-disk kubernetes2005.codfw.wmnet 0 10g
  • follow https://wikitech.wikimedia.org/wiki/Ganeti#Reinstall_/_Reimage_a_VM to reinstall the VM with Bullseye

Change 770459 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set overlayfs + bullseye for kubernetes2005

https://gerrit.wikimedia.org/r/770459

Change 770459 merged by Elukey:

[operations/puppet@production] Set overlayfs + bullseye for kubernetes2005

https://gerrit.wikimedia.org/r/770459

All good, kubernetes2005 reimaged as planned. The only little issue that I encountered was a confirm question during d-i to use the whole disk (20G) instead of half of it as partman suggested (I think that it is related to the current values of the recipe that have 10g as max capacity).

Change 770879 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2006

https://gerrit.wikimedia.org/r/770879

Change 770879 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2006

https://gerrit.wikimedia.org/r/770879

Change 770912 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set simpler partman recipe for kubernetes200[5,6]

https://gerrit.wikimedia.org/r/770912

Change 770912 merged by Elukey:

[operations/puppet@production] Set simpler partman recipe for kubernetes200[5,6]

https://gerrit.wikimedia.org/r/770912

Change 771319 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] install_server: improve the kubernetes-node-virtual-overlay recipe

https://gerrit.wikimedia.org/r/771319

Change 771319 merged by Elukey:

[operations/puppet@production] install_server: improve the kubernetes-node-virtual-overlay recipe

https://gerrit.wikimedia.org/r/771319

Change 771356 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] install_server: move kubernetes200[5,6] to the new flat-noswap recipe

https://gerrit.wikimedia.org/r/771356

Change 771356 merged by Elukey:

[operations/puppet@production] install_server: move kubernetes200[5,6] to the new flat-noswap recipe

https://gerrit.wikimedia.org/r/771356

Change 771422 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2015

https://gerrit.wikimedia.org/r/771422

Change 771423 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2016

https://gerrit.wikimedia.org/r/771423

Change 771422 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2015

https://gerrit.wikimedia.org/r/771422

Change 771552 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye in dhcp config for kubernetes201[5,6]

https://gerrit.wikimedia.org/r/771552

Change 771552 merged by Elukey:

[operations/puppet@production] Set bullseye in dhcp config for kubernetes201[5,6]

https://gerrit.wikimedia.org/r/771552

Change 771423 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes2016

https://gerrit.wikimedia.org/r/771423

Change 771598 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] Add kubernetes1018-1022

https://gerrit.wikimedia.org/r/771598

Change 771600 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlay settings for kubernetes10[01][56] nodes

https://gerrit.wikimedia.org/r/771600

Change 771601 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set overlay settings for kubernetes1005

https://gerrit.wikimedia.org/r/771601

Change 771602 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set overlay settings for kubernetes1006

https://gerrit.wikimedia.org/r/771602

Change 771603 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set overlay settings for kubernetes1015

https://gerrit.wikimedia.org/r/771603

Change 771604 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set overlay settings for kubernetes1016

https://gerrit.wikimedia.org/r/771604

Change 771598 merged by Alexandros Kosiaris:

[operations/puppet@production] Add kubernetes1018-1022

https://gerrit.wikimedia.org/r/771598

Change 771600 merged by Elukey:

[operations/puppet@production] Set bullseye + overlay settings for kubernetes10[01][56] nodes

https://gerrit.wikimedia.org/r/771600

Change 771601 merged by Elukey:

[operations/puppet@production] Set overlay settings for kubernetes1005

https://gerrit.wikimedia.org/r/771601

Change 771602 merged by Elukey:

[operations/puppet@production] Set overlay settings for kubernetes1006

https://gerrit.wikimedia.org/r/771602

Change 771603 merged by Elukey:

[operations/puppet@production] Set overlay settings for kubernetes1015

https://gerrit.wikimedia.org/r/771603

Change 771604 merged by Elukey:

[operations/puppet@production] Set overlay settings for kubernetes1016

https://gerrit.wikimedia.org/r/771604

Change 770440 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1007

https://gerrit.wikimedia.org/r/770440

Change 772686 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1008

https://gerrit.wikimedia.org/r/772686

Change 772686 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1008

https://gerrit.wikimedia.org/r/772686

Change 773181 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1009

https://gerrit.wikimedia.org/r/773181

Change 773181 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1009

https://gerrit.wikimedia.org/r/773181

Change 773193 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1010

https://gerrit.wikimedia.org/r/773193

Change 773193 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1010

https://gerrit.wikimedia.org/r/773193

Change 773278 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1011

https://gerrit.wikimedia.org/r/773278

Change 773278 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1011

https://gerrit.wikimedia.org/r/773278

Change 773389 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] install_server: update netboot settings for kubernetes nodes on Stretch

https://gerrit.wikimedia.org/r/773389

Change 773390 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes1012

https://gerrit.wikimedia.org/r/773390

Change 773389 merged by Elukey:

[operations/puppet@production] install_server: update netboot settings for kubernetes nodes on Stretch

https://gerrit.wikimedia.org/r/773389

Change 773390 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs settings for kubernetes1012

https://gerrit.wikimedia.org/r/773390

Change 773443 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1013

https://gerrit.wikimedia.org/r/773443

Change 773443 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1013

https://gerrit.wikimedia.org/r/773443

Change 773466 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1014

https://gerrit.wikimedia.org/r/773466

Change 773466 merged by Elukey:

[operations/puppet@production] Set bullseye + overlayfs for kubernetes1014

https://gerrit.wikimedia.org/r/773466

Change 773520 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] kubernetes: clean up extra netboot and host settings

https://gerrit.wikimedia.org/r/773520

Change 773520 merged by Elukey:

[operations/puppet@production] kubernetes: clean up extra netboot and host settings

https://gerrit.wikimedia.org/r/773520

Change 773751 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] kubernetes: apply devicemapper settings to kubernetes[12]00[1-4]

https://gerrit.wikimedia.org/r/773751

Change 773751 merged by Elukey:

[operations/puppet@production] kubernetes: apply devicemapper settings to kubernetes[12]00[1-4]

https://gerrit.wikimedia.org/r/773751

Removing myself as assignee of the task, all the reimages completed. The remaining steps are:

JMeybohm claimed this task.
JMeybohm updated the task description. (Show Details)

I think the SRE part of this is all done. Thanks everyone! ❤

@JMeybohm - something to note, when setting up the new dse-k8s-ctrl100[1-2] servers recently under T310172, I observed a race condition whereby the devicemapper storage engine got initialized before overlayfs.

After a puppet run I had to do the following:

systemctl stop docker
cd /var/lib/docker
rm -rf *
systemctl start docker

After this the overlayfs storage engine was in use.

@JMeybohm - something to note, when setting up the new dse-k8s-ctrl100[1-2] servers recently under T310172, I observed a race condition whereby the devicemapper storage engine got initialized before overlayfs.

After a puppet run I had to do the following:

systemctl stop docker
cd /var/lib/docker
rm -rf *
systemctl start docker

After this the overlayfs storage engine was in use.

that may happen when the node is role(insetup) (which means overlayfs module is blacklisted) and then docker is installed via a role change. There is no guarantee that the overlayfs module can be loaded when docker starts. IIRC setting the hiera key profile::docker::engine::force_default_docker_storage: true prevents this (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/759678)

Or not including profile::docker::storage at all. Sorry, seems like we did not clean that up after moving from devicemapper to overlay