Page MenuHomePhabricator

Upgrade Toolforge K8s to 1.17
Open, MediumPublic


Kubernetes supports the latest three minor versions at any given time. We are currently running 1.16, and with 1.19 now released, that will fall off active development.

It's time to start upgrades to 1.17. There are things to check and validate, at least one python library to upgrade, and things to check before moving forward, of course. This should then be upgraded in PAWS as well.

Event Timeline

Bstorm triaged this task as Medium priority.Sep 18 2020, 7:43 PM
Bstorm created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 18 2020, 7:43 PM

Connecting to the previous task for reference. This one probably will not be as complicated because we have fewer deprecated objects.

In reviewing the changelog, I've picked up a couple callouts so far:

  • metrics-server gets an upgrade, which could change some bits of the prometheus metrics
  • kube-proxy now supports nftables AND autodetects which mode to use
  • ipv6 dualstack support is improved
  • We can upgrade docker
  • is removed (which is fine because we use v1beta2!)
  • default service IP CIDR and must be defined -- I think we already do, but worth checking

That makes it all seem relatively painless. Deploying in toolsbeta will highlight things missed. I'll check if there are useful updates (or important compat ones) to the ingress controllers or calico.

If we bring calico up to date we will:

  • Be sad that calico doesn't make past-version release notes available for some reason (intentional or not)
  • Be able to try out encrypting all pod traffic if we want, which would vastly increase secrets security in Toolforge if it is performant (and if we keep putting off ipv6, which isn't supported with this)
  • See some bug fixes that might improve performance. I cannot find anything breaking.
Bstorm added a comment.EditedWed, Sep 30, 5:45 PM

On the ingress:

  • We appear to be behind a bit more than we are in calico. Current release is 0.35.0 (we are at 0.25.1)
  • Development is clearly focused on the helm chart, which would be a very good way to deploy and manage the controller (allowing more easy testing locally and things like that). It might be worth proposing that on another task. A *really* old version is even in the internal chart museum. The upstream repo is quietly structured as a helm chart (as of some upgrade since ours). Good info here, including zero downtime upgrade guides: Releases are based on the version of the helm chart, which is why the version seems weird vs. their release tags--though they also tag the controller version.
  • TLSv1.3 is enabled by default in 0.33.0
  • We MUST upgrade to 0.32.0 before Kubernetes 1.18 (bugs)
  • Some of the best fixes are in 0.26.0
  • Full update is quite a jump in nginx versions. It'd take some testing, but I think our usage is not too "strange" or customized vs standard capabilities.
  • 0.31.0 and 0.28.0 fix CVEs and fairly serious ones

I'll make a separate task for upgrading the ingress and investigating deploy via helm values file instead of what we do with kubectl.

Change 631410 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepo: bootstrap repo for thirdparty/kubedam-k8s-1-17

Change 631410 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: bootstrap repo for thirdparty/kubedam-k8s-1-17

Mentioned in SAL (#wikimedia-operations) [2020-10-01T11:14:26Z] <arturo> pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 (T263284)

Change 631424 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepo: add missing update reference for thirdparty/kubeadm-k8s-1-17

Change 631424 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: add missing update reference for thirdparty/kubeadm-k8s-1-17

The 1.17 packages are in the repo.

NOTE: the profile::wmcs::kubeadm::component hiera key in the affected nodes should be set accordingly to thirdparty/kubeadm-k8s-1-17 before attempting the upgrade.

I've verified that maintain-kubeusers is able to pass all its tests (which includes all API interactions, I think) against 1.17.13.

That's my biggest concern before deploying straight to toolsbeta!