🔷 Upgrade kubernetes from 1.21 to 1.22
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	• toan
	Jun 23 2022, 8:38 AM

Description

We are currently using version 1.21.x, a version set for retirement/EOL the 28th this month

We should upgrade this to run on a newer version locally and for our staging/production environments

This would involve resolving the deprecation of some of our v1beta usage, mostly ingresses it seems but there could be a lot of hidden gems/problems that would appear once we start doing this.

Decide for a new target version 1.22
Upgrade all environments to that new target version

Useful links:

Related Objects
Search...

Status	Assigned	Task
Resolved	Evelien_WMDE	T311205 🔷 Upgrade kubernetes from 1.21 to 1.22
Resolved	Evelien_WMDE	T322542 🔷 Remove all instances of removed 1.22 k8s APIs
Resolved	Evelien_WMDE	T322543 🔵 Remove beta ingress API from wikibase-ingress chart
Resolved	Evelien_WMDE	T311505 🔵 Resolve Kubernetes client bump from dependabot PR
Resolved	Evelien_WMDE	T322552 🔵 Update nginx-ingress to ingress-nginx chart v4.2.5

Event Timeline

• toan created this task.Jun 23 2022, 8:38 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 23 2022, 8:38 AM

• toan mentioned this in T310697: [timebox 16hrs] Investigate using Prometheus locally to monitor k8s PV utilisation.Jun 23 2022, 2:24 PM

See https://cloud.google.com/kubernetes-engine/docs/release-notes for possible targets for us to aim for. I would suggest we aim for 1.22 as the Regular version

• toan mentioned this in T311505: 🔵 Resolve Kubernetes client bump from dependabot PR.Jul 1 2022, 7:28 AM

Addshore awarded a token.Aug 12 2022, 1:23 PM

Tarrow moved this task from Backlog (incoming) to Tech prioritized backlog on the Wikibase Cloud board.Aug 15 2022, 12:29 PM

I looks like to more forwards to 1.22 on GKE we are only blocked by two APIs we call that will no longer be around:

API	User agent	Total calls (last 30 days)	Last called
/apis/networking.k8s.io/v1beta1/ingresses	nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format	12698	14 Aug 2022, 05:04:00
/apis/extensions/v1beta1/ingresses	Go-http-client/2.0	18	8 Aug 2022, 15:54:00

Looks to me like the only real thing required would be to update the version of the nginx-ingress charts. This however probably does result in us needing to remove the "pinned" bitnami charts repository

Tarrow renamed this task from Upgrade kubernetes from 1.21 to Upgrade kubernetes from 1.21 to 1.22.Aug 25 2022, 1:17 PM

Tarrow set the point value for this task to 8.Aug 25 2022, 1:19 PM

Tarrow moved this task from Tech prioritized backlog to Ready to Pick Up on the Wikibase Cloud board.

Deniz_WMDE moved this task from Ready to Pick Up to Wikibase.cloud (WB Cloud Sprint 5) on the Wikibase Cloud board.Sep 14 2022, 12:16 PM

Deniz_WMDE edited projects, added Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 5)); removed Wikibase Cloud.

Rosalie_WMDE moved this task from Wikibase.cloud (WB Cloud Sprint 5) to Ready to Pick Up on the Wikibase Cloud board.Sep 28 2022, 12:20 PM

Rosalie_WMDE edited projects, added Wikibase Cloud; removed Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 5)).

Rosalie_WMDE moved this task from Ready to Pick Up to Wikibase.cloud (WB Cloud Sprint 8) on the Wikibase Cloud board.Oct 26 2022, 12:10 PM

Rosalie_WMDE edited projects, added Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 8)); removed Wikibase Cloud.

Deniz_WMDE claimed this task.Oct 28 2022, 10:41 AM

Deniz_WMDE moved this task from Sprint Backlog to Doing on the Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 8)) board.

note that v1.22 reached EOL as of today https://kubernetes.io/releases/#release-v1-22

I refactored the UI chart in this PR to use the new stable ingress API. more details here: https://github.com/wbstack/charts/pull/104

I created similar PRs for the API, QueryService, and QueryService UI charts as well

Deniz_WMDE removed Deniz_WMDE as the assignee of this task.Oct 28 2022, 2:25 PM

Deniz_WMDE moved this task from Doing to Sprint Backlog on the Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 8)) board.

Deniz_WMDE subscribed.

Tarrow claimed this task.Nov 7 2022, 11:17 AM

Tarrow moved this task from Sprint Backlog to Doing on the Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 8)) board.

Tarrow removed Tarrow as the assignee of this task.Nov 7 2022, 1:29 PM

Tarrow moved this task from Doing to Sprint Backlog on the Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 8)) board.

Tarrow renamed this task from Upgrade kubernetes from 1.21 to 1.22 to 🔵 Upgrade kubernetes from 1.21 to 1.22.Nov 8 2022, 3:26 PM

Tarrow renamed this task from 🔵 Upgrade kubernetes from 1.21 to 1.22 to 🔷 Upgrade kubernetes from 1.21 to 1.22.Nov 8 2022, 3:28 PM

Rosalie_WMDE renamed this task from 🔷 Upgrade kubernetes from 1.21 to 1.22 to 🔷 Upgrade kubernetes from 1.21 to 1.25.Nov 9 2022, 11:24 AM

Rosalie_WMDE renamed this task from 🔷 Upgrade kubernetes from 1.21 to 1.25 to 🔷 Upgrade kubernetes from 1.21 to 1.22.Nov 9 2022, 11:39 AM

@Rosalie_WMDE and I had a chat. Right now we want to at a minimum upgrade to 1.22 since this is supported by GKE (https://cloud.google.com/kubernetes-engine/docs/release-notes#current_versions). GKE also supports 1.23 and 1.24 but not 1.25 yet. We probably want to move up the versions incrementally because some services that we use (ingress-nginx/nginx-ingress for example) don't support a wide enough version range for us to make the jump in one step.

Tarrow updated the task description. (Show Details)Nov 9 2022, 11:43 AM

Deniz_WMDE moved this task from Wikibase.cloud (WB Cloud Sprint 8) to Wikibase.cloud (WB Cloud Sprint 9) on the Wikibase Cloud board.Nov 9 2022, 1:14 PM

Deniz_WMDE edited projects, added Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 9)); removed Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 8)).

Today we upgraded the staging clusters wbaas-2 control plane and node-pools to 1.22.15-gke.1000. This was a 3-step process (control plane, oldest node-pool medium-pool, the other node-pool) and was done via the Google Cloud Console UI (docs).
Note: In the future, if we activate auto-upgrade for the node pools, upgrading just the control plane will be enough.

During the upgrade the following alerts fired:

5:37:15 PM (wbaas-2): SQL replica readiness probe failure
6:15:42 PM (wbaas-2): SQL replica readiness probe failure
6:18:17 PM Uptime Health Check: https-coffeebase-wikibase-dev-query-sparql
6:18:23 PM Uptime Health Check: https-coffeebase-wikibase-dev-widar

Throughout the upgrade of the second node pool (standard-pool, 4 nodes) there popped up a message in the Cloud Console indicating that the upgrades are a bit delayed, possibly because of pod disruption budgets or grace periods.

The current upgrade delay of 7 minutes per node indicates your Pod Disruption Budgets may need attention. Learn more

Everything was completed after roundabout ~1h20m

Deniz_WMDE claimed this task.Nov 22 2022, 6:04 PM

Deniz_WMDE moved this task from Sprint Backlog to Doing on the Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 9)) board.

dang moved this task from Wikibase.cloud (WB Cloud Sprint 9) to WB Cloud Sprint 10 on the Wikibase Cloud board.Nov 23 2022, 1:08 PM

dang edited projects, added Wikibase Cloud (WB Cloud Sprint 10); removed Wikibase Cloud (Wikibase.cloud (WB Cloud Sprint 9)).

Evelien_WMDE moved this task from Sprint Backlog to Doing on the Wikibase Cloud (WB Cloud Sprint 10) board.Nov 23 2022, 1:09 PM

Evelien_WMDE moved this task from Doing to Waiting for Deploy to Production on the Wikibase Cloud (WB Cloud Sprint 10) board.

Evelien_WMDE moved this task from Waiting for Deploy to Production to Done on the Wikibase Cloud (WB Cloud Sprint 10) board.

Deniz_WMDE removed Deniz_WMDE as the assignee of this task.Nov 23 2022, 1:24 PM

The upgrade on wbaas-3 was concluded today as well, pretty much the same experience as with staging yesterday.

What was noticeable is that some node upgrades were delayed again, this time I spotted the reason: missing set of elastic search availability - it waited because only 1 pod was allowed to be unavailable at any time, and currently the time until one of them reports as ready is around ~40m after pod start.

The production cluster is now running on 1.22.15-gke.1000

Evelien_WMDE closed this task as Resolved.Dec 7 2022, 10:36 AM

Evelien_WMDE claimed this task.

Evelien_WMDE closed subtask T322542: 🔷 Remove all instances of removed 1.22 k8s APIs as Resolved.

	F35453758: image.png
	Aug 15 2022, 12:40 PM

🔷 Upgrade kubernetes from 1.21 to 1.22Closed, ResolvedPublic8 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

🔷 Upgrade kubernetes from 1.21 to 1.22
Closed, ResolvedPublic8 Estimated Story Points
Actions

Related Objects
Search...