Page MenuHomePhabricator

Upgrade Toolforge Kubernetes to latest 1.18
Closed, ResolvedPublic

Description

Always start with toolsbeta.

The current release of Kubernetes is 1.21. We aim to stay within the supported versions, and while there is currently a 1 year support cycle starting at 1.20, we are at 1.17. On those versions only three minor versions were supported for patches, so we've dropped clean off. We should try to move to 1.19 as soon as is feasible as well.

Make sure you read https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#urgent-upgrade-notes
Then there's the appropriate upgrade guide: https://v1-18.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

And then there's our notes: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Upgrading_Kubernetes

Related Objects

Event Timeline

Bstorm created this task.

If we get to 1.19, it's the exact same process, but there is no need to refresh the certs again if done in the same six month period.

I've read the release notes for both 1.18 and 1.19, summary below.

Kubernetes

1.18
  • consumers of the 'certificatesigningrequests/approval' API must now have permission to 'approve' CSRs for the specific signer requested by the CSR. More information on the new signerName field and the required authorization can be found at https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests#authorization (#88246, @munnerz) [SIG API Machinery, Apps, Auth, CLI, Node and Testing]
    • This breaks maintain-kubeusers, see patch attached on T280300
  • Deprecated user-facing APIs removed, all have replacements on newer APIs, just need to check if any tools are using the legacy ones
    • apps/v1beta[12]
    • extensions/v1beta1 daemonsets, deployments, replicasets
    • extensions/v1beta1 networkpolicies
    • extensions/v1beta1 podsecuritypolicies
  • Upgrade guide itself looks fairly standard, nothing special in there
1.19
  • I don't see anything special on the release notes or upgrade guide, should be a relatively simple upgrade except that we need to upgrade Calico before upgrading to Kubernetes 1.19 (see below).

Calico

  • Docs for older versions not available online? ended up digging git tags for their source on github
  • We're currently on Calico 3.14, which supports k8s 1.16-1.18, latest Calico (3.18) supports k8s 1.18-1.20
  • Do we need to upgrade via each version or can we skip some from the middle?

Change 680253 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] aptrepo: bootstrap repo for thirdparty/kubedam-k8s-1-17

https://gerrit.wikimedia.org/r/680253

One more thing from 1.18 notes:

  • New IngressClass resource and a field to replace now-deprecated field kubernetes.io/ingress.class, field is not in use in webservice but is used in the iw tool and in its documentation and possibly on other hand-crafted deployments

Change 680367 had a related patch set uploaded (by Majavah; author: Majavah):

[labs/tools/registry-admission-webhook@master] Update to apps/v1

https://gerrit.wikimedia.org/r/680367

In T280299#7008150, @Majavah wrote:

I've read the release notes for both 1.18 and 1.19, summary below.

Kubernetes

1.18
  • Deprecated user-facing APIs removed, all have replacements on newer APIs, just need to check if any tools are using the legacy ones
    • apps/v1beta[12]
    • extensions/v1beta1 daemonsets, deployments, replicasets
    • extensions/v1beta1 networkpolicies
    • extensions/v1beta1 podsecuritypolicies
  • Upgrade guide itself looks fairly standard, nothing special in there

This bit is likely ok except for new objects. The apis stop being served, but the server translates existing objects to the current APIs (usually). The rare cases that isn't true usually involve removing an entire feature.

Anyone doing this upgrade may see etcd timeout failures due to high iowait on the etcd cluster, btw. T279723 might help, but this has been an on-going problem.

Change 680388 had a related patch set uploaded (by Bstorm; author: Bstorm):

[labs/tools/registry-admission-webhook@master] golang fun: fix the module entries for the k8s api code

https://gerrit.wikimedia.org/r/680388

Change 680388 merged by jenkins-bot:

[labs/tools/registry-admission-webhook@master] golang fun: fix the module entries for the k8s api code

https://gerrit.wikimedia.org/r/680388

Change 680367 merged by jenkins-bot:

[labs/tools/registry-admission-webhook@master] Update to apps/v1

https://gerrit.wikimedia.org/r/680367

Change 680253 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] aptrepo: bootstrap repo for thirdparty/kubedam-k8s-1-18

https://gerrit.wikimedia.org/r/680253

Mentioned in SAL (#wikimedia-cloud) [2021-06-08T14:08:37Z] <majavah> update toolsbeta-test-k8s-control-4 to kubernetes 1.18 T280299

Mentioned in SAL (#wikimedia-cloud) [2021-06-08T14:57:30Z] <majavah> continuing to update rest of k8s control nodes T280299

Mentioned in SAL (#wikimedia-cloud) [2021-06-08T15:02:20Z] <majavah> continuing to update k8s ingress nodes T280299

Mentioned in SAL (#wikimedia-cloud) [2021-06-08T15:11:56Z] <majavah> updating k8s worker nodes to 1.18 T280299

I upgraded toolsbeta today and tested some basic operations. Everything seems to be working fine, I don't see an issue continuing to tools/paws as long as new tool creation works which I don't have access to test.

Change 701975 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] tools-clush: remove paws from clush and add the rest of the k8s setup

https://gerrit.wikimedia.org/r/701975

Change 701975 merged by Bstorm:

[operations/puppet@production] tools-clush: remove paws from clush and add the rest of the k8s setup

https://gerrit.wikimedia.org/r/701975

Mentioned in SAL (#wikimedia-cloud) [2021-06-29T17:03:02Z] <majavah> starting toolforge kubernetes 1.18 upgrade - T280299

Mentioned in SAL (#wikimedia-cloud) [2021-06-29T20:11:49Z] <majavah> toolforge kubernetes upgrade complete T280299