Page MenuHomePhabricator

PAWS: Upgrade Kubernetes
Open, Stalled, NormalPublic

Description

PAWS Kubernetes needs to be upgraded.

Event Timeline

GTirloni triaged this task as High priority.Dec 4 2018, 11:35 AM
GTirloni created this task.
GTirloni created this object with visibility "Custom Policy".
GTirloni moved this task from Inbox to Important on the cloud-services-team (Kanban) board.
GTirloni added a comment.EditedMar 15 2019, 4:44 AM

I've created a new k8s cluster for PAWS, which is semi-Puppetized and semi-complete at this point but I wanted to share some details:

Inventory:

paws-puppetmaster-01 - Puppetmaster - m1.small - OK
paws-packages-01 - Package repository - m1.small - OK

paws-master-01 - Control Plane - m1.medium - OK
paws-master-02 - Control Plane - m1.medium - OK
paws-master-03 - Control Plane - m1.medium - OK

paws-int-lb-01 - Control Plane load balancer - m1.small - OK
paws-int-lb-02 - Control Plane load balancer - m1.small - OK

paws-ext-lb-01 - External load balancer - m1.small - TODO
paws-ext-lb-02 - External load balancer - m1.small - TODO

paws-worker-01 - Worker node - m1.large - OK
paws-worker-02 - Worker node - m1.large - OK
paws-worker-03 - Worker node - m1.large - OK
paws-worker-04 - Worker node - m1.large - OK

Versions:

* Kubernetes = 1.13.4
* Docker = 18.09.3 (not officially supported but works fine)

The Puppet configuration so far only takes care of installing the necessary packages, starting services and making everything ready for kubeadm. The automation is in /var/lib/git/operations/puppet.git on paws-puppetmaster-01 (WIP). It's missing nginx configuration and the orchestration that kubeadm does (not sure if we want to automate that).

I used the following kubeadm-config:

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
  certSANs:
  - "paws-apiserver.wmflabs.org"
controlPlaneEndpoint: "paws-apiserver.wmflabs.org:6443"
networking:
  podSubnet: "192.168.0.0/16"

I've initialized the cluster in HA mode with nginx stream module in front of the API servers, as documented here (etcd running on the masters). You basically just copy the certificates from master-01 to master-02/03 and run kubeadm join with extra flags:

kubeadm join paws-apiserver.wmflabs.org:6443 --token $token --discovery-token-ca-cert-hash sha256:$hash --ignore-preflight-errors=FileAvailable--etc-kubernetes-pki-ca.crt  --experimental-control-plane

Then you join the workers without the extra flags, just like kubeadm init tells you.

Unfortunately, I couldn't find out a way to create a A record under paws.eqiad.wmflabs for the API load balancer so I created one under wmflabs.org which is not ideal. (EDIT: Andrew tells me we have a script called makedomain that can help. I'll try creating the paws.wmflabs.org subdomain with the proper A records under it later and then drop the paws-apiserver.wmflabs.org record).

For external access during tests, I created paws2.wmflabs.org pointing to the external LBs. Unconfigured for now.

The network plugin is Calico.

# kubectl get nodes
NAME             STATUS   ROLES    AGE   VERSION
paws-master-01   Ready    master   21m   v1.13.4
paws-master-02   Ready    master   17m   v1.13.4
paws-master-03   Ready    master   16m   v1.13.4
paws-worker-01   Ready    <none>   10m   v1.13.4
paws-worker-02   Ready    <none>   10m   v1.13.4
paws-worker-03   Ready    <none>   10m   v1.13.4
paws-worker-04   Ready    <none>   10m   v1.13.4

The idea is that the architecture work like this:

[ users ] -> [ external LBs ] -> [ k8s workers ] -> [ internal API LB ] -> [ API server ]

Turning various internal LBs on/off worked perfectly. kubectl was able to continue working and access to the API servers wasn't interrupted. This needs more testing to see if a API server that is down drops from the rotation.

Sample nginx configuration for the internal LBs (external LBs will be similar but pointing to the workers):

stream {
  upstream paws_apiserver {
    server paws-master-01.paws.eqiad.wmflabs:6443 max_fails=3 fail_timeout=10s;
    server paws-master-02.paws.eqiad.wmflabs:6443 max_fails=3 fail_timeout=10s;
    server paws-master-03.paws.eqiad.wmflabs:6443 max_fails=3 fail_timeout=10s;
  }

  server {
    listen 6443;
    proxy_pass paws_apiserver;
  }
}

Some tests:

# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
nginx-7cdbd8cdc9-2jmnn   1/1     Running   0          11m   192.168.3.5   paws-worker-01   <none>           <none>
nginx-7cdbd8cdc9-4f4z9   1/1     Running   0          11m   192.168.6.5   paws-worker-04   <none>           <none>
nginx-7cdbd8cdc9-5t8nf   1/1     Running   0          11m   192.168.5.4   paws-worker-03   <none>           <none>
nginx-7cdbd8cdc9-6lccs   1/1     Running   0          11m   192.168.3.2   paws-worker-01   <none>           <none>
nginx-7cdbd8cdc9-6rxt8   1/1     Running   0          11m   192.168.5.2   paws-worker-03   <none>           <none>
nginx-7cdbd8cdc9-9qznn   1/1     Running   0          11m   192.168.6.3   paws-worker-04   <none>           <none>
nginx-7cdbd8cdc9-bcb5b   1/1     Running   0          11m   192.168.4.2   paws-worker-02   <none>           <none>
nginx-7cdbd8cdc9-c4lz7   1/1     Running   0          11m   192.168.6.4   paws-worker-04   <none>           <none>
nginx-7cdbd8cdc9-g42kl   1/1     Running   0          11m   192.168.5.5   paws-worker-03   <none>           <none>
nginx-7cdbd8cdc9-m4fj5   1/1     Running   0          11m   192.168.4.4   paws-worker-02   <none>           <none>
nginx-7cdbd8cdc9-pmtqs   1/1     Running   0          11m   192.168.3.4   paws-worker-01   <none>           <none>
nginx-7cdbd8cdc9-px4j8   1/1     Running   0          11m   192.168.4.5   paws-worker-02   <none>           <none>
nginx-7cdbd8cdc9-qjm9n   1/1     Running   0          11m   192.168.5.3   paws-worker-03   <none>           <none>
nginx-7cdbd8cdc9-ssvqb   1/1     Running   0          11m   192.168.6.2   paws-worker-04   <none>           <none>
nginx-7cdbd8cdc9-svwfk   1/1     Running   0          11m   192.168.4.3   paws-worker-02   <none>           <none>
nginx-7cdbd8cdc9-vf729   1/1     Running   0          11m   192.168.3.3   paws-worker-01   <none>           <none>

With the nginx image cached, starting 32 replicas on this cluster happens in <5 seconds.

What's missing:

  • Puppetize LB configuration and figure out kubeadm orchestration, where to store tokens, etc
  • Deploy nginx ingress controller
  • Mount NFS
  • Try to deploy PAWS and fix all the things this new setup will break
  • Who knows...
  • Kubernetes = 1.14.3

I am guessing this is 1.13.4 as 1.14 has not been released yet.

The network plugin is Calico.

That is interesting, could you share some more information? I am guessing all nodes are in a BGP full mesh setup to redistribute the Pod IPs to each other. Is that correct? Is there a dedicated upstream node for access to out-of-cluster resources (e.g. the internet?) Or is every node equivalent?

Thanks for this btw, it's great!

  • Kubernetes = 1.14.3

I am guessing this is 1.13.4 as 1.14 has not been released yet.

Yep, typo :)

The network plugin is Calico.

That is interesting, could you share some more information? I am guessing all nodes are in a BGP full mesh setup to redistribute the Pod IPs to each other. Is that correct? Is there a dedicated upstream node for access to out-of-cluster resources (e.g. the internet?) Or is every node equivalent?

Yes, I think that's how Calico works internally. This is the IP routing table in one of the workers:

192.168.0.0/24 via 172.16.2.146 dev tunl0 proto bird onlink # paws-master-01
192.168.1.0/24 via 172.16.2.147 dev tunl0 proto bird onlink # paws-master-02 
192.168.2.0/24 via 172.16.2.148 dev tunl0 proto bird onlink  # paws-master-03
192.168.3.0/24 via 172.16.2.155 dev tunl0 proto bird onlink  # paws-worker-01
192.168.4.0/24 via 172.16.2.156 dev tunl0 proto bird onlink  # paws-worker-03 
192.168.6.0/24 via 172.16.2.16 dev tunl0 proto bird onlink   # paws-worker-04
blackhole 192.168.5.0/24 proto bird  # paws-worker-02
192.168.5.14 dev cali464afe9b4c2 scope link # various pods...
192.168.5.15 dev cali50dc2ce3676 scope link 
192.168.5.16 dev cali794f898b386 scope link 
192.168.5.17 dev calic35fb21e320 scope link 
192.168.5.18 dev cali162b6faa968 scope link 
192.168.5.19 dev calidaea851d58c scope link 
192.168.5.20 dev cali96fa02c5fdc scope link 
192.168.5.21 dev califae182b344d scope link

In the standard configuration, there is no dedicated gateway for this cluster so all nodes go out to the Internet using the default Cloud VPS gateway.

GTirloni added a comment.EditedMar 23 2019, 7:35 PM

Completed the following:

  • Deleted ext-lb-03 and int-lb-03 (seemed overkill to have that many)
  • Deployed nginx ingress controller (without a service, using hostNetwork: true in the pod spec, see this) -- http://paws2.wmflabs.org currently

While trying to deploy cert-manager, I faced this issue.

What remains to be done:

  • Remove paws.wmflabs.org from wmflabs.org zone and create a paws.wmflabs.org zone
  • Enable NFS
  • Configure cert-manager
  • Try to deploy PAWS
  • Move basic Puppet config somewhere and complete it (things like building the cluster itself, deploying these yaml (or helm charts), etc).
GTirloni removed a parent task: Restricted Task.Mar 25 2019, 2:11 PM
GTirloni edited projects, added PAWS; removed Kubernetes, Operations, Security.
GTirloni updated the task description. (Show Details)
GTirloni changed the visibility from "Custom Policy" to "Public (No Login Required)".

Mentioned in SAL (#wikimedia-cloud) [2019-03-25T14:12:39Z] <gtirloni> created paws.wmflabs.org subdomain under paws project (T211096)

Chicocvenancio moved this task from Backlog to MVP (Most Valuable PAWS) on the PAWS board.

Partial puppetization:

GTirloni removed GTirloni as the assignee of this task.Mar 25 2019, 6:54 PM
GTirloni removed a subscriber: GTirloni.
GTirloni added a comment.EditedMar 25 2019, 7:21 PM

There are some concerns about moving PAWS into its own Cloud VPS project and what that would represent in terms of support/monitoring.

This shouldn't block an in-place upgrade using the existing PAWS servers that are inside the Toolforge project. However, the work on a new architecture is stalled until these discussions happen.

Re-opened T167086 for discussion.

GTirloni changed the task status from Open to Stalled.Mar 25 2019, 7:21 PM
GTirloni lowered the priority of this task from High to Normal.