⚓ T246122 Upgrade the Toolforge Kubernetes cluster to v1.16

Subject	Repo	Branch	Lines +/-
toolforge-k8s: proposing removing hostkey checking for the upgrades	operations/puppet	production	+2 -2
kubeadm: fix some inconsistencies in the worker upgrade script	operations/puppet	production	+15 -5
kubeadm: fix broken definition of extra volume	operations/puppet	production	+1 -1
toolforge-kubeadm: kubeadm 1.16 requires docker 18.09	operations/puppet	production	+1 -1

• Bstorm triaged this task as High priority.Feb 25 2020, 4:37 PM

• Bstorm created this task.

• bd808 removed a parent task: T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster.Feb 25 2020, 5:13 PM

• bd808 added a subtask: T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster.

• bd808 moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

• bd808 added a parent task: Restricted Task.Feb 25 2020, 5:26 PM

Krenair subscribed.Mar 3 2020, 9:56 PM

Important notes for a lot of things changed in 1.16: https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/

• bd808 closed subtask T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster as Resolved.Apr 11 2020, 8:47 PM

• Bstorm closed subtask T246123: Switch PodSecurityPolicy API versioning in maintain-kubeusers from extensions/v1beta1 to policy.k8s.io/v1beta1 as Resolved.Apr 14 2020, 6:27 PM

• Bstorm updated the task description. (Show Details)Apr 15 2020, 10:23 PM

• Bstorm updated the task description. (Show Details)

• Bstorm updated the task description. (Show Details)Apr 15 2020, 10:39 PM

• Bstorm mentioned this in T211096: PAWS: Rebuild and upgrade Kubernetes.Apr 15 2020, 10:56 PM

• Bstorm added a subtask: T211096: PAWS: Rebuild and upgrade Kubernetes.Apr 15 2020, 11:01 PM

mdaniels5757 subscribed.Apr 20 2020, 4:18 PM

• Bstorm added a subtask: T197930: Replace pykube with a library that works with newer Kubernetes APIs.Apr 21 2020, 8:55 PM

• Bstorm updated the task description. (Show Details)

• Bstorm updated the task description. (Show Details)Apr 21 2020, 9:05 PM

• Bstorm updated the task description. (Show Details)Apr 21 2020, 9:26 PM

This deprecation will probably catch some hand-built deployments:

Deployment in the extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API versions is no longer served

Migrate to use the apps/v1 API version, available since v1.9. Existing persisted data can be retrieved/updated via the new version.

Looks like I updated the example at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Example_deployment.yaml in January though, so at least our docs are in reasonable shape. Worth remembering for the eventual announcement. I think I read some stuff about these being difficult to find in a live cluster because the API does forward & backward transformations, meaning that they all query on the old and new namespaces while they are supported in the cluster.

• Bstorm updated the task description. (Show Details)Apr 21 2020, 9:36 PM

• Bstorm updated the task description. (Show Details)Apr 21 2020, 10:17 PM

aborrero subscribed.Apr 28 2020, 3:49 PM

aborrero closed subtask T250866: Stage packages for upstream kubeadm v1.16.9 to use in Toolforge as Resolved.May 7 2020, 5:30 PM

• Bstorm updated the task description. (Show Details)May 12 2020, 6:45 PM

• Bstorm closed subtask T197930: Replace pykube with a library that works with newer Kubernetes APIs as Resolved.

• Bstorm closed subtask T250863: Upgrade calico to a more recent version (current is 3.14.0) as Resolved.May 13 2020, 7:04 PM

• Bstorm updated the task description. (Show Details)

Change 598093 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-kubeadm: kubeadm 1.16 requires docker 18.09

https://gerrit.wikimedia.org/r/598093

gerritbot added a project: Patch-For-Review.May 22 2020, 6:35 PM

aborrero closed subtask T250867: Script the process of upgrading a node with kubeadm to 1.16.9 as Resolved.May 26 2020, 9:22 AM

aborrero updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T09:30:44Z] <arturo> set profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-1-16' at project level for trying T246122

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T09:56:02Z] <arturo> installing kubectl/kubeadm 1.16.9 on k8s control nodes (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T09:57:24Z] <arturo> installing kubectl/kubeadm 1.16.9 on k8s worker nodes (T246122)

NOTE: kubeadm suggest we should upgrade etcd, but 3.2 is what we have for now, in both Debian and the WMF repos.

External components that should be upgraded manually before you upgrade the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT   AVAILABLE
Etcd        3.2.26    3.3.10

Mentioned in SAL (#wikimedia-operations) [2020-05-26T14:44:45Z] <arturo> upgrade packages in buster-wikimedia/thirdpardy/kubeadm-k8s-1-16 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T14:54:08Z] <arturo> bump installed version of kubeadm and kubectl to 1.16.10 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T14:54:21Z] <arturo> aborrero@toolsbeta-test-k8s-control-1:~ $ sudo -i kubeadm upgrade apply v1.16.10 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T15:02:04Z] <arturo> first k8s upgrade failed for yet-to-be-known reasons (T246122)

Change 598093 merged by Bstorm:
[operations/puppet@production] toolforge-kubeadm: kubeadm 1.16 requires docker 18.09

https://gerrit.wikimedia.org/r/598093

I see there were psp changes around 1.16 https://github.com/kubernetes/kubernetes/pull/77792
That isn't likely to be our issue, but something to be aware of.

@aborrero I think I know what is wrong in Toolsbeta. It is the same thing that I saw just now on paws. There is an error in the kubeadm config (which becomes the kubeadm configmap). The name of the extra volume needed for encryption and some other important config for the apiserver is wrong. I must have done this by mistake somewhere during that very long security eval. I made changes in place instead of rebuilding clusters, so I never saw the discrepancy.

The name of the extra volume in the various kubeadm configs is currently invalid:
from the command kubectl get cm -n kube-system kubeadm-config -o yaml

apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
    enable-admission-plugins: PodSecurityPolicy,PodPreset,NodeRestriction,EventRateLimit
    admission-control-config-file: /etc/kubernetes/admission/admission.yaml
    encryption-provider-config: /etc/kubernetes/admission/encryption-conf.yaml
    runtime-config: settings.k8s.io/v1alpha1=true
    tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    profiling: "false"
  extraVolumes:
    - name: "/etc/kubernetes/admission"
      hostPath: "/etc/kubernetes/admission"
      mountPath: "/etc/kubernetes/admission"
      readOnly: true
      pathType: Directory

The error is - name: "/etc/kubernetes/admission"

That should read: - name: admission-config-dir

That is the correct value from the actual live manifests at /etc/kubernetes/manifests/api-server.yaml
Because I changed the api-server manifest directly when I did this, the error would never show up until we tried to upgrade the cluster with kubeadm where the configmap is suddenly in use again. I'll change the value in puppet immediately, and then I will change it in kubeadm's configmaps.

To be clear, this would prevent the api-server pod from starting after upgrade. I suspect that's exactly what caused the error you saw (partly because it is very similar to my kubeadm init error and because the pod cannot start with that value for a volume name).

Change 598792 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] kubeadm: fix broken definition of extra volume

https://gerrit.wikimedia.org/r/598792

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T16:17:41Z] <bstorm_> fix incorrect volume name in kubeadm-config T246122

Mentioned in SAL (#wikimedia-cloud) [2020-05-26T16:20:24Z] <bstorm_> fix incorrect volume name in kubeadm-config configmap T246122

Change 598792 merged by Bstorm:
[operations/puppet@production] kubeadm: fix broken definition of extra volume

https://gerrit.wikimedia.org/r/598792

I think this should be unblocked and the upgrade might work on the next try. Probably should depool control plane nodes before upgrading then repooling them per https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ since that is the newer procedure (in case that fixes anything that my fix didn't--the thing I fixed would have stopped the upgrade no matter what). I don't think we should worry a bit about fussing with haproxy during the upgrade because the tooling should all be compatible between the two versions. The big thing we must check before the tools upgrade is to make sure that all the objects created with old definitions are still working on the upgraded cluster...presuming we get the upgrade rolling.

Mentioned in SAL (#wikimedia-cloud) [2020-05-27T10:58:08Z] <arturo> running aborrero@toolsbeta-test-k8s-control-1:~ $ sudo -i kubeadm upgrade apply v1.16.10 and this time it works! (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-27T10:58:52Z] <arturo> running aborrero@toolsbeta-test-k8s-control-1:~ $ sudo apt-get install kubelet -y in the 1.16 version from the component repo (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-27T11:02:58Z] <arturo> upgraded the rest of the k8s control plane nodes to 1.16.10 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-27T11:05:13Z] <arturo> trying modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --domain eqiad.wmflabs --src-version 1.15 --dst-version 1.16.10 -n toolsbeta-test-k8s-worker-1 -n toolsbeta-test-k8s-worker-2 -n toolsbeta-test-k8s-worker-3 (T246122)

Change 599003 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: fix some inconsistencies in the worker upgrade script

https://gerrit.wikimedia.org/r/599003

Mentioned in SAL (#wikimedia-cloud) [2020-05-27T12:02:37Z] <arturo> the k8s cluster is now running v1.16.10 (T246122)

Change 599003 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: fix some inconsistencies in the worker upgrade script

https://gerrit.wikimedia.org/r/599003

I'd forgot to check deprecated objects by the end of the day yesterday, but I checked this morning in Toolsbeta...and there may not be any there. I replaced all the PSPs already in tools and TB as I recall and the deployments there are replaced.

We should be ok, but if anyone's deployment stops working, webservice stop/start will replace it.

• Bstorm updated the task description. (Show Details)May 28 2020, 1:55 PM

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T15:09:48Z] <arturo> upgrading tools-k8s-control-1 to 1.16.10 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T15:17:39Z] <arturo> upgrading tools-k8s-control-2 to 1.16.10 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T15:41:15Z] <arturo> upgrading tools-k8s-control-3 to 1.16.10 (T246122)

It's looking good after a short problem:

bstorm@tools-sgebastion-08:~$ kubectl --as-group=system:masters --as=admin get nodes
NAME                  STATUS   ROLES    AGE    VERSION
tools-k8s-control-1   Ready    master   204d   v1.16.10
tools-k8s-control-2   Ready    master   203d   v1.16.10
tools-k8s-control-3   Ready    master   203d   v1.16.10
tools-k8s-worker-1    Ready    <none>   203d   v1.15.6
tools-k8s-worker-10   Ready    <none>   142d   v1.15.6

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T15:58:39Z] <arturo> upgrading tools-k8s-worker-[1..10] to 1.16.10 (T246122)

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T16:01:28Z] <bstorm_> kubectl upgraded to 1.16.10 on all bastions T246122

We discovered that there is a bug in kubeadm < 1.17 that sets renew-certs to false on node upgrades. The control plane certs rotated fine, but the kubelet certs of worker nodes did not. https://github.com/kubernetes/kubeadm/issues/1818 This is also referenced in the docs here https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/

Worker node 1-6 will need manual updates to their certs if we don't upgrade again before then.

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T17:54:22Z] <bstorm_> upgraded tools-k8s-worker-[11..15] and starting on -21-29 now T246122

Looking deeper into things, I think kubeadm is confusingly documented (we knew that). In order to upgrade the client cert for kubelet, we can simply set the kubelets to do it for us with a feature gate. The settings are here https://kubernetes.io/docs/tasks/tls/certificate-rotation/
This is distinct from *serving certificate rotation*, which we deliberately avoided. I'll make another task and a patch to add the args to our kubelets.

I did confirm our control plane certs look right.

Never mind! The blasted config is the default on this version: RotateKubeletClientCertificate=true|false (BETA - default=true) from https://v1-16.docs.kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

We should watch the cert behavior when the test cluster in toolsbeta should be ready to renew. If that fails, then we'll explicitly add the options, which are all marked as deprecated. K8s docs are fun, right? I believe this is why I didn't enable them during the design phase. It's hard to remember all these details.

NOTE: I copied the admin.conf to .kube/config for the root account on each control plane node because I realized our upgrade renewed that cert :)

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T21:06:34Z] <bstorm_> upgrading tools-k8s-worker-[30-60] to kubernetes 1.16.10 T246122

Change 599472 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-k8s: proposing removing hostkey checking for the upgrades

https://gerrit.wikimedia.org/r/599472

• Bstorm updated the task description. (Show Details)May 28 2020, 11:44 PM

• Bstorm mentioned this in T253837: Batches in QuickStatements not working.May 29 2020, 5:05 PM

Change 599472 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge-k8s: proposing removing hostkey checking for the upgrades

https://gerrit.wikimedia.org/r/599472

I think we are done with this one!

• bd808 awarded a token.Jun 2 2020, 10:50 PM

• Bstorm closed subtask T211096: PAWS: Rebuild and upgrade Kubernetes as Resolved.Aug 14 2020, 5:12 PM

• Bstorm added a subtask: T263284: Upgrade Toolforge K8s to 1.17.Sep 18 2020, 7:45 PM

• Bstorm closed subtask T263284: Upgrade Toolforge K8s to 1.17 as Resolved.Dec 11 2020, 7:47 PM

Upgrade the Toolforge Kubernetes cluster to v1.16
Closed, ResolvedPublic
Actions

Description

Details

Related Objects
Search...

Event Timeline

Status	Subtype	Assigned	Task
Duplicate		Chicocvenancio	T218150 Update PAWS with Zero to JupyterHub k8s 0.8.0 chart
			Restricted Task
Resolved		• Bstorm	T246122 Upgrade the Toolforge Kubernetes cluster to v1.16
Resolved		• Bstorm	T246123 Switch PodSecurityPolicy API versioning in maintain-kubeusers from extensions/v1beta1 to policy.k8s.io/v1beta1
Resolved		• Bstorm	T214513 Deploy and migrate tools to a Kubernetes v1.15 or newer cluster
Resolved		• Bstorm	T111914 Setup DNS for kubernetes services
Resolved		aborrero	T142862 Setup Kubernetes Masters in a HA setup
Resolved		aborrero	T215663 Stand up upgraded Toolforge etcd clusters
Resolved		aborrero	T215530 Sort out the best method of spinning up multiple toolforge kubernetes masters
Resolved		aborrero	T215679 Sort out and test deploying the worker nodes in a sane fashion
Resolved		aborrero	T215975 Package/copy kubeadm, kubelet, docker-ce and kubectl to Toolforge Aptly or Reprepro
Resolved		Jprorama	T172855 Create visual diagram of documented components of Toolforge Kubernetes cluster
Resolved		aborrero	T215531 Deploy upgraded Kubernetes to toolsbeta
Resolved		• Bstorm	T215529 Puppetize/stand up a load balancer for K8s API servers
Resolved		aborrero	T226098 Toolforge: modernize deployment for etcd in k8s
Resolved		aborrero	T228267 Toolforge: iptables flavor for Debian Buster-based k8s cluster
Resolved		aborrero	T228500 Toolforge: evaluate ingress mechanism
Resolved		aborrero	T234032 Toolforge ingress: create a default landing page for unknown/default URLs
Resolved		aborrero	T234037 Toolforge ingress: decide on final layout of north-south proxy setup
Resolved		aborrero	T235059 Toolforge: refresh puppet code for proxy (dynamicproxy) to support Debian Buster
Resolved		• Bstorm	T234231 Toolforge ingress: decide on how ingress configuration objects will be managed
Resolved		dduvall	T236203 Add CI checks for golang admission controllers
Resolved		• Bstorm	T254293 Change to admission controller readme.md failed to pass gate-and-submit jobs
Resolved		aborrero	T228660 Toolforge: new k8s: issues with the initial coredns setup
Resolved		• Bstorm	T228887 Update pause container in our internal registry
Resolved		• Bstorm	T229009 Proposal: ditching the master name in kubernetes servers
Resolved		• Bstorm	T234702 Review and establish configurable quotas for users in the new Kubernetes cluster
Resolved		aborrero	T236074 Toolforge: rebuild the new k8s toolsbeta deployment and write final docs
Resolved		aborrero	T236249 Toolforge: new k8s: upload internal docker images to our registry
Resolved		aborrero	T236824 Toolforge: new k8s: get new deb packages for 1.15.4 or 1.15.5
Resolved		aborrero	T237443 toolsbeta: new k8s: deploy a front proxy (dynamicproxy)
Resolved		• Bstorm	T237541 CoreDNS in the new k8s cluster cannot talk to the Cloud recursors
Declined		None	T238641 toolforge: some additional testing before final migration
Resolved		aborrero	T239403 toolforge: new k8s: scale up a bit the cluster before final tests and initial migrations
Open		None	T239404 [k8s,infra] evaluate DNS (coredns) autoscale options
Resolved		aborrero	T239405 toolforge: new k8s: evaluate ingress controller reload behaviour
Stalled		None	T239406 toolforge: new k8s: evalute and test firewalling via calico
Resolved		aborrero	T238655 toolforge: new k8s: issues with the apiserver and etcd
Resolved		• Bstorm	T215553 Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade
Resolved		• Bstorm	T169287 etcd config depends on puppet certs, but puppet doesn't know
Resolved		yuvipanda	T119814 Figure out how to deal with SSL cert issues for kubernetes masters
Duplicate		None	T144153 Move kubernetes authentication to using X.509 client certs
Resolved		• Bstorm	T215678 Replace each of the custom controllers with something in a new Toolforge Kubernetes setup
Resolved		• Bstorm	T227290 Design and document how to integrate the new Toolforge k8s cluster with PodSecurityPolicy
Resolved		• Bstorm	T238162 Establish a process for renewing TLS certs for the 2 webhook controllers
Duplicate		None	T224273 Toolforge: develop new k8s cluster in toolsbeta
Resolved		• Bstorm	T233372 Create a "novaobserver" equivalent for Toolforge Kubernetes cluster inspection
Resolved	Security	aborrero	T346313 toolforge: k8s-status: prevent it from accessing some information
Resolved		aborrero	T235627 Toolforge: upgrade main proxy servers to Debian Buster
Duplicate		None	T235756 Toolforge: webservice utility: add support for thew new k8s setup
Resolved		• Bstorm	T236202 Modify webservice and maintain-kubeusers to allow switching to the new cluster
Resolved		• Bstorm	T228499 Toolforge: changes to maintain-kubeusers
Resolved		• Bstorm	T229058 Replace the nslcd mount in containers from the old Toolforge cluster with something that will work with sssd in the new one
Resolved		• Bstorm	T237836 `webservice restart` regression with backend=kubernetes in webservice 0.51
Resolved		aborrero	T236826 Toolforge: new k8s: initial build of the new kubernetes cluster
Resolved		aborrero	T237633 Request increased quota for tools Cloud VPS project
Resolved		aborrero	T237643 toolforge: new k8s: figure out metrics / observability
Resolved		aborrero	T237557 new proxy and etcd nodes unreachable by ssh for tools-prometheus
Resolved		aborrero	T238058 toolforge: prometheus-node-exporter not working on tools-proxy-06
Resolved		aborrero	T238096 Toolforge: prometheus: refresh setup
Resolved		aborrero	T245180 Document and test failing over prometheus
Resolved		aborrero	T240402 Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes
Resolved		aborrero	T241853 Move metrics-server and kube-state-metrics into the new metrics namespace
Resolved		• Bstorm	T237784 Document migration plans and timelines
Resolved		• Bstorm	T237789 Document (and execute) the upgrade process for the new Toolforge K8s cluster
Resolved		• Bstorm	T238654 toolforge: new k8s: issues with routing interfering with DNS in the cluster as well as the webhook controllers
Duplicate		None	T239407 toolforge: new k8s: package newer/more convenient python3 k8s client libs
Resolved		aborrero	T239409 toolforge: new k8s: introduce more robust controls for deb pkg versions
			Restricted Task
Open		None	T272905 Reduce privs of metrics pods where we can
Resolved		• Bstorm	T240922 Change name of commons_describer tool or provide some workaround for Kubernetes/DNS
Resolved		• bd808	T240923 Fix toolschecker's insistence that a kubeconfig is json
Invalid		aborrero	T240925 `kubectl get pods` fails after switching to new k8s cluster
Resolved		• bd808	T241008 New k8s cluster routing behaving strangely for bd808-test tool
Resolved		• bd808	T241310 Kubernetes ingress passes it's port & proto to apps rather than the port & proto from the front proxy
			Restricted Task
Resolved	Security	• Bstorm	T242067 Error joining new worker node to Toolforge Kubernetes cluster
Resolved		aborrero	T242719 https://tools.wmflabs.org/{toolname} no longer redirects to https://tools.wmflabs.org/{toolname}/ on new k8s cluster
Resolved		• bd808	T242824 Tool account cannot list all namespaced objects in its Kubernetes namespace
Duplicate		None	T243468 Add smarter resourcing logic to kubernetes backend of webservice
Resolved		• Bstorm	T244289 Improve limit range management in webservice for Kubernetes
Resolved		• bd808	T244293 Add a function to webservice called "migrate" that will push a tool from the old cluster on Kubernetes to the new one
Resolved		• bd808	T244791 Scale up 2020 Kubernetes cluster for final migration of legacy cluster workloads
Resolved		• Bstorm	T211096 PAWS: Rebuild and upgrade Kubernetes
Duplicate		None	T219077 Add NFS to project paws
Resolved		aborrero	T195217 Simplify ingress methods for PAWS
Duplicate		Chicocvenancio	T218157 Determine TLS termination
Resolved		• Bstorm	T253702 Figure out storing images for PAWS
Resolved		aborrero	T255252 PAWS: enable acme-chief in the project
Resolved		aborrero	T255997 PAWS: figure out paws-public in the new service model
Resolved		• Bstorm	T167086 Consider moving PAWS to its own Cloud VPS project, rather than using instances inside Toolforge
Resolved		• Bstorm	T160113 Move PAWS nfs onto its own share
Resolved		• Bstorm	T255628 VPS Project dumps is using 2.4 TB at /data/project on NFS
Resolved		• Bstorm	T188912 Puppetize PAWS k8s cluster
Resolved		aborrero	T251295 Plan the integration of new WMCS naming schemes into PAWS
Resolved		aborrero	T251297 Refactor the toolforge::k8s::kubeadm* modules
Resolved		• Bstorm	T251298 Design the resource limits, RBAC and PSP needed for the PAWS Kubernetes cluster
Resolved		• Bstorm	T253071 Grant permissions to TravisCI for the Toolforge GitHub organization
Resolved		• Bstorm	T253241 Add helm3 to the component repo
Resolved		• Bstorm	T253267 Configure the soft anti-affinity (and presumably the soft affinity) server policy
Resolved		• Bstorm	T255635 Proposal to protect the master branch for https://github.com/toolforge/paws
Resolved		• Bstorm	T256140 Create whatever is needed for OAUTH grants for the new paws cluster and domain
Open		None	T243200 Move PAWS to OAuth 2.0
Open	Feature	None	T323849 pywikibot: Support OAuth2 token
Resolved		• Bstorm	T256361 PAWS: get new service and cluster metrics into prometheus
Resolved		aborrero	T257534 CloudVPS: a VM is unable to contact floating IPs of other VMs
Resolved		aborrero	T287107 CloudVPS: we may need DNS records for neutron port VIP addresses
Resolved		• Bstorm	T256689 Add test and lint to the CI setup for PAWS
Resolved		• Bstorm	T258812 Fix the paws-public links in the new cluster!
Resolved		None	T258831 Odd error when creating a new session on the new PAWS cluster occasionally
Declined		None	T218152 Test new configuration in PAWS-Beta
Resolved		• bd808	T197930 Replace pykube with a library that works with newer Kubernetes APIs
Resolved		• Bstorm	T250863 Upgrade calico to a more recent version (current is 3.14.0)
Resolved		aborrero	T250866 Stage packages for upstream kubeadm v1.16.9 to use in Toolforge
Resolved		aborrero	T250867 Script the process of upgrading a node with kubeadm to 1.16.9
Resolved		• Bstorm	T250874 Refresh external certs for the toolforge k8s cluster after the upgrade
Resolved		• Bstorm	T263284 Upgrade Toolforge K8s to 1.17
Resolved		taavi	T264221 Upgrade the nginx ingress controller in Toolforge (and likely PAWS)
Resolved		• Bstorm	T282087 Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes
Resolved		taavi	T284353 Fix prometheus monitoring for Toolforge Ingress
Resolved		• Bstorm	T268669 Upgrade PAWS k8s to 1.17
Resolved		aborrero	T269865 k8s: kubelet.conf embeds cert data in k8s < 1.17

Upgrade the Toolforge Kubernetes cluster to v1.16Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Upgrade the Toolforge Kubernetes cluster to v1.16
Closed, ResolvedPublic
Actions

Related Objects
Search...