Page MenuHomePhabricator

Upgrade Toolforge Kubernetes
Open, HighPublic

Description

This task is for tracking the process of upgrading Toolforge Kubernetes to a modern setup. This is expected to be a bit involved, and should probably have subtasks gathered under it.

Details

Related Gerrit Patches:
operations/puppet : productionkubectl: upgrade /usr/bin/kubectl to 1.15.5
operations/software/tools-webservice : masternew k8s: Fix ingress object and enable toolsbeta ingress creation
operations/puppet : productiontoolforge: new k8s: rename node to worker
operations/puppet : productiontoolforge: new k8s: rename hiera keys for consistency

Related Objects

StatusAssignedTask
Open bd808
OpenNone
OpenNone
OpenNone
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
OpenJprorama
OpenNone
OpenNone
Resolvedaborrero
ResolvedBstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedBstorm
OpenJdforrester-WMF
OpenNone
ResolvedKrenair
ResolvedNone
ResolvedAndrew
Resolvedaborrero
ResolvedBstorm
ResolvedBstorm
ResolvedBstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedBstorm
OpenNone
Resolvedaborrero
OpenNone
Openaborrero
OpenNone
Resolvedaborrero
StalledBstorm
ResolvedBstorm
Resolved yuvipanda
DuplicateNone
ResolvedBstorm
ResolvedBstorm
OpenBstorm
DuplicateNone
OpenNone
Resolvedaborrero
DuplicateNone
OpenBstorm
OpenBstorm
ResolvedBstorm
ResolvedBstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
OpenNone
OpenBstorm
ResolvedBstorm
ResolvedBstorm
OpenNone
Resolvedaborrero

Event Timeline

Bstorm triaged this task as High priority.Jan 23 2019, 7:46 PM
Bstorm created this task.
Bstorm created this object with visibility "Custom Policy".
Bstorm changed the visibility from "Custom Policy" to "Public (No Login Required)".Jan 23 2019, 8:04 PM
Bstorm added a project: Goal.

For context, other tickets that could be pertinent include: T153943, T111885...will find more.

Ok, some notes:

  • Currently, while it appears that we've been left with a way to add SANs to puppet certificates (T119814), this is not currently implemented on the cluster as is.
  • It is interesting to note that puppet docs state that only the puppetmaster should have alternate names, in the section explaining how to sign a client's cert that has alternate names.
  • It is possible to use any old client certs in kubernetes as long as you specify which CA cert to validate it against (and that cert can be found, of course). Puppet is a CA, so that's a way this could go for client certs even if we do other things for other parts, and we've certainly relied on that sort of thing before.
  • Using puppet as our PKI does occasionally cause issues (T169287), and an api server can do its own PKI, and this topic could still use a little thought. Most stuff at the foundation seems to favor using puppet certs because that was done historically, but the puppet certs in production are apparently part of an authenticating proxy setup and are a bit more cared-for than those in a VPS.
  • kubeadm will install self-signed certs and such by default, but you can do: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#external-ca-mode
  • We should have separate etcd clusters for both k8s and calico, similar to what we have now for k8s and flannel, but not really old and strange.

Moving this to epics and portioning out the experimentation and work more.

If using kubeadm, during installation, it will pull the following images for 1.13 of k8s:

k8s.gcr.io/kube-apiserver:v1.13.0
k8s.gcr.io/kube-controller-manager:v1.13.0
k8s.gcr.io/kube-scheduler:v1.13.0
k8s.gcr.io/kube-proxy:v1.13.0
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.6

For calico, it's:

image: calico/cni:v3.5.0
image: calico/node:v3.5.0
image: calico/kube-controllers:v3.5.0
GTirloni removed a subscriber: GTirloni.Mar 21 2019, 9:06 PM
Xinbenlv added a subscriber: Xinbenlv.

Waiting for the K8s to be upgraded to modern version.

Hugely important work~ Finger-crossed!

Because the version of Kubernetes in Toolforge was related to some lousy error messages during an outage, and this is now one of the actionables from that incident, adding the Incident tag.

Base added a subscriber: Base.Oct 19 2019, 5:43 PM

Change 547504 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: new k8s: rename hiera keys for consistency

https://gerrit.wikimedia.org/r/547504

Change 547504 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: new k8s: rename hiera keys for consistency

https://gerrit.wikimedia.org/r/547504

Change 547509 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: new k8s: rename node to worker

https://gerrit.wikimedia.org/r/547509

Change 547509 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: new k8s: rename node to worker

https://gerrit.wikimedia.org/r/547509

Change 549613 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] new k8s: Fix ingress object and enable toolsbeta ingress creation

https://gerrit.wikimedia.org/r/549613

Change 549661 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] kubectl: upgrade /usr/bin/kubectl to 1.15.5

https://gerrit.wikimedia.org/r/549661

Change 549613 merged by Bstorm:
[operations/software/tools-webservice@master] new k8s: Fix ingress object and enable toolsbeta ingress creation

https://gerrit.wikimedia.org/r/549613

Change 549661 merged by Bstorm:
[operations/puppet@production] kubectl: upgrade /usr/bin/kubectl to 1.15.5

https://gerrit.wikimedia.org/r/549661

aborrero removed a subscriber: chasemp.Tue, Nov 19, 10:20 AM