Page MenuHomePhabricator

Update to kernel 4.19 on kubernetes nodes
Closed, ResolvedPublic

Description

We should update our kubernetes clusters to kernel 4.19 which is in Debian Stretch now (prior to going all Buster).

https://packages.debian.org/stretch/linux-image-4.19-amd64

We could start with staging ofc and if that looks okay update selected nodes in eqiad and codfw to see how they behave under higher load.

There now is a hiera key that can be set to install kernel 4.19

profile::base::linux419::enable

Event Timeline

JMeybohm triaged this task as Medium priority.Sep 10 2020, 11:40 AM
JMeybohm created this task.

We should probably create some linux419 base profile (or similar) which

  • pulls in linux-image-4.19-amd64 (the meta package for 4.19 on stretch)
  • installs a rasdaemon backport (/dev/mcelog was removed in 4.12, so the mcelog no longer works), in buster we're installing rasdaemon by default due to that T205396

and then add that to the k8s worker role.

We should probably create some linux419 base profile (or similar) which

  • pulls in linux-image-4.19-amd64 (the meta package for 4.19 on stretch)
  • installs a rasdaemon backport (/dev/mcelog was removed in 4.12, so the mcelog no longer works), in buster we're installing rasdaemon by default due to that T205396

and then add that to the k8s worker role.

Sounds like a good plan to me.

Change 626592 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] Add ::profile::base::linux419 to set up kernel 4.19 on stretch

https://gerrit.wikimedia.org/r/626592

Change 626592 merged by JMeybohm:
[operations/puppet@production] Add ::profile::base::linux419 to set up kernel 4.19 on stretch

https://gerrit.wikimedia.org/r/626592

Change 627867 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] Use Kernel 4.19 on kubestage1002

https://gerrit.wikimedia.org/r/627867

Change 627868 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] Use Kernel 4.19 on staging cluster nodes

https://gerrit.wikimedia.org/r/627868

Change 627867 merged by JMeybohm:
[operations/puppet@production] Use Kernel 4.19 on kubestage1002

https://gerrit.wikimedia.org/r/627867

Mentioned in SAL (#wikimedia-operations) [2020-09-17T07:55:10Z] <jayme> cordoning kubestage1002 for kernel upgrade - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-17T07:55:26Z] <jayme> draining kubestage1002 for kernel upgrade - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-17T08:25:40Z] <jayme> reboot kubestage1002 for kernel upgrade - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-17T08:43:53Z] <jayme> uncordoned kubestage1002 after kernel upgrade - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-17T08:49:48Z] <jayme> deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-18T07:12:08Z] <jayme> draining kubestage1001 for kernel upgrade - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-18T08:25:28Z] <jayme> reboot kubestage1001 for clean state testing - T262527

Change 627868 merged by JMeybohm:
[operations/puppet@production] Use Kernel 4.19 on staging cluster nodes

https://gerrit.wikimedia.org/r/627868

Mentioned in SAL (#wikimedia-operations) [2020-09-18T08:43:20Z] <jayme> reboot kubestage1001 for kernel upgrade - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-18T08:56:14Z] <jayme> reboot kubestage1001 for clean state - T262527

Mentioned in SAL (#wikimedia-operations) [2020-09-18T09:47:08Z] <jayme> deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - T262527

I've updated the second node to Kernel 4.19 as well but the throttling values don't look as good as I had hoped they would.
I ran some https://github.com/indeedeng/fibtest on empty kubestage1001 (40 core machine) on kernel 4.9 and 4.19 and they also don't show the expected difference in "CPU Usage" I would have expected. Feels like we're missing something here...
(See https://grafana.wikimedia.org/d/q-YRTAdGz/jayme-node-cfs-details?orgId=1 for a very board overview of throttling on the nodes)

kernelgovernorfibtest threadsIterations Completed(M)Throttled forCPU Usage (msecs)
4.9performance1119152513
4.9performance10112251503
4.9performance2096052506
4.9performance4069852501
4.9powersave145553512
4.9powersave1040752513
4.9powersave2035152505
4.9powersave4025153506
kernelgovernorfibtest threadsIterations Completed(M)Throttled forCPU Usage (msecs)
4.19performance1125252516
4.19performance10106351504
4.19performance2092352498
4.19performance4065452498
4.19powersave147752511
4.19powersave1040053509
4.19powersave2035652510
4.19powersave4023753507

This does look pretty nice indeed. That drop there is actually quite telling. I am inclined to say we may have solved 1 problem at least.

This does look pretty nice indeed. That drop there is actually quite telling. I am inclined to say we may have solved 1 problem at least.

Yeah. As just said today I will do a test with deploying resource requests and limits for the statsd exporter of eventgate-analytics to see if that changes the throttling behavior there. Then I would probably start to write the cookbook for rolling restarts of k8s clusters (T260661) in preparation of the update in eqiad and codfw (all of that in next quarter)

JMeybohm added a project: User-jijiki.
JMeybohm updated the task description. (Show Details)
JMeybohm added a subscriber: jijiki.

We will reimage the production workers as part of T244335

All nodes running kernel 4.19 now