Page MenuHomePhabricator

Deploy upgraded Kubernetes to toolsbeta
Open, HighPublic

Description

This is the epic for the first step of putting this up in beta before it goes live.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Bstorm added a comment.Jul 4 2019, 2:32 PM

Lemme know what you think! I kind of struggled with the whole idea myself, and that's where I ended up. The logic could possibly be turned in the opposite direction and one could say we need a script for rebuilding a node (that copies the certs for us) instead, figuring *that* won't happen often either?

I think we can use this kubeadm option:

sudo kubeadm init phase upload-certs --upload-certs

which just re-upload the certs. So the workflow I'm proposing is:

  • bootstrap the first control plane node (master-1)
  • bootstrap the other control plane nodes (master-2, master-3)
  • days or weeks pass
  • if we later want to rebuild any of the 3, use the command above to re-generate the certs into the secret store
  • bootstrap the new master node the same way as before, because certs are already in the secret store.
aborrero added a comment.EditedJul 4 2019, 4:33 PM

I think we can use this kubeadm option:

sudo kubeadm init phase upload-certs --upload-certs

I just tested this. It works!

Since the very basic deployment is already working in toolsbeta, I would suggest we split the remaining work into subtasks, like:

I think I will create those as subtasks.

I detected an issue in the etcd pods:

2019-07-04 17:16:06.697591 I | embed: rejected connection from "172.16.0.104:32810" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")", ServerName "")
2019-07-04 17:16:06.719517 I | embed: rejected connection from "172.16.0.104:32812" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")", ServerName "")

After a while, etcd pods (and the api-server ones) start to crash and eventually the cluster dies. I wonder if we are overwriting something with puppet.

Bstorm added a comment.Jul 4 2019, 9:12 PM

Is that happening in the one deployed with puppet distributed certs or using CLI only? If using CLI only, it would be a bug in kubeadm, I think. I wasn't seeing that with the puppet distributed (which is basically manual distribution)--at least as long as I watched it for, it may have snuck up later.

Or...could it possibly be overwriting/generating some certs when it does sudo kubeadm init phase upload-certs --upload-certs? Like using the kubeapi CA cert still, but regenerating the etcd ones, which would cause that?

Bstorm added a comment.Jul 4 2019, 9:30 PM

My thinking is that maybe it all works except when we do the upload-certs later to try to rebuild? This would suggest we still need a manual cert copy for a rebuild, which isn't the end of the world. Just more docs and/or scripts (or even possibly adding the certs to labs/private later on for that, which would work fine as well--and the way I did it in puppet, isn't used without a hiera trigger). I don't know for sure without some logs or testing, though, obviously. Just a guess.

I think I know what is going on.

In my tests I did something like this:

  1. bootstrap the cluster
  2. add new members using the --upload-certs trick (testing a later upload)
  3. for further investigation, then I deleted a master (kubeadm reset -f + kubectl delete node <whatever)
  4. try join again a control plane node using the --upload-certs trick.

I think the problem here is that in the step 3, the etcd is not properly cleaned up by kubeadm.
You can see logs messages like the following:

2019-07-05 09:48:51.686278 W | rafthttp: health check for peer dc517b72d6f67e5c could not connect: dial tcp 172.16.0.175:2380: connect: connection refused (prober "ROUND_TRIPPER_SNAPSHOT")
2019-07-05 09:48:51.719150 W | rafthttp: health check for peer dc517b72d6f67e5c could not connect: dial tcp 172.16.0.175:2380: connect: connection refused (prober "ROUND_TRIPPER_RAFT_MESSAGE")

The etcd cluster is still expecting to contact the deleted node! This means the etcd cluster is unhealthy, and any subsequent join operation for the control plane fails. This seems to me like a bug in kubeadm.

So, I see here 2 possible bugs:

  1. the issue with etcd-ca and SeverName "", which I'm not sure yet what means.
  2. etcd member not properly removed when reseting/deleting a control plane node, preventing any further join operation for masters.

I don't manage to do the join step for masters nodes using a config file. I can only successfully do it by using the cmdline generated by the first kubadm init run.

I've been trying with this config file (and some additional random modifications), which is a variant of the one installed by puppet:

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  bootstrapToken:
    apiServerEndpoint: toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443
    token: "m7uakr.ern5lmlpv7gnkacw"
    unsafeSkipCAVerification: true
  timeout: 5m0s
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinControlPlane
localAPIEndpoint:
  bindPort: 6443
CertificateKey: "test"

Summary of the current issues I see/have in the bootstrapping/lifecycle workflow:

  • we can only add a master if using the auto-generated cmdline for kubeadm join
  • we can not delete a master node, since that will leave etcd in an inconsistent state (even when explicitly running kubeadm reset -f phase remove-etcd-member)

I was able to re-join a master node to the cluster after a reset if:

  • you don't delete /var/lib/etcd after the reset
  • you run kubeadm join skipping a bunch of steps: kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:whatever --experimental-control-plane --certificate-key whatever --ignore-preflight-errors=DirAvailable--var-lib-etcd --skip-phases=check-etcd,control-plane-join/etcd
  • kubeadm will be unable to create the etcd pod, so you need to copy /etc/kubernetes/manifests/etcd.yaml from other master node, update the IP addresses and then:
root@toolsbeta-test-k8s-master-3:~# kubectl apply -f /etc/kubernetes/manifests/etcd.yaml 
pod/etcd created
Bstorm added a comment.EditedJul 5 2019, 4:10 PM

Yeah, it doesn't seem possible to join a master via the config file (on your earlier comment). In at least one bug (https://github.com/kubernetes/kubeadm/issues/1485), the developers stated that this is "by design" that --control-plane is only available for the CLI. The only use in having the join config on a control plane node seems to be if we want to spin up later without the ca verification option.

Overall, it seems like rebuilding a master node will not be fun and requires that we document all of this very well. Thank you for figuring it out with the CLI. It seems almost like it's actually smoother the way I'd done it using puppet? Manually copying the certs either with puppet or CLI, the cluster seems to join up fast...but etcd will still have a problem if the node is replacing an old one, like you said.

Etcd will not forget a node unless you force it to. I don't think kubeadm has a direct interface into that (yet), so we'd have to use etcdctl https://docs.okd.io/latest/admin_guide/assembly_restore-etcd-quorum.html

Maybe we need to make sure we know how to communicate with the containerized etcd. It's probably no different as long as we have etcdctl installed somewhere. If installed via package, it might try to start etcd on the node? I don't know yet. Have to look into that :-/

Side comment: is there a way for developers (who are not WMF staff) to use them in beta?

Side comment: is there a way for developers (who are not WMF staff) to use them in beta?

We don't have an actual kubernetes cluster yet. We build and dump it several times a day. I would suggest you wait until we have something ready for the Toolforge service.

Yeah, it doesn't seem possible to join a master via the config file (on your earlier comment). In at least one bug (https://github.com/kubernetes/kubeadm/issues/1485), the developers stated that this is "by design" that --control-plane is only available for the CLI. The only use in having the join config on a control plane node seems to be if we want to spin up later without the ca verification option.
Overall, it seems like rebuilding a master node will not be fun and requires that we document all of this very well. Thank you for figuring it out with the CLI. It seems almost like it's actually smoother the way I'd done it using puppet? Manually copying the certs either with puppet or CLI, the cluster seems to join up fast...but etcd will still have a problem if the node is replacing an old one, like you said.
Etcd will not forget a node unless you force it to. I don't think kubeadm has a direct interface into that (yet), so we'd have to use etcdctl https://docs.okd.io/latest/admin_guide/assembly_restore-etcd-quorum.html
Maybe we need to make sure we know how to communicate with the containerized etcd. It's probably no different as long as we have etcdctl installed somewhere. If installed via package, it might try to start etcd on the node? I don't know yet. Have to look into that :-/

We don't have etcdctl anywhere. I wonder if we should try using the external etcd cluster approach before anything else.
Copying certificates once generated by kubeadm is not a very elegant solution I think. I would rather use puppet certs instead, but we already know that could be complex as well.

:-/

@Bstorm and I just had a meeting. We decided the following:

  • try using an external etcd server again. This time, using Debian Buster, which contains etcd 3.2.26+dfsg-3 (higher than the required 3.2.18) (https://tracker.debian.org/pkg/etcd)
  • we will try using puppet certs for the etcd server, and keep k8s using its own CA
  • we will eventually try kubeadm in debian buster as well. Mind iptables changes in Debian buster (kube-proxy, docker, etc)

Mentioned in SAL (#wikimedia-cloud) [2019-07-15T12:27:13Z] <arturo> create toolsbeta-test-k8s-etcd-1 VM T215531

The new etcd server is ready (apparently):

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 cluster-health
member 67a7255628c1f89f is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379
cluster is healthy

Change 523220 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: k8s: kubeadm: now using external etcd servers

https://gerrit.wikimedia.org/r/523220

Just a comment on the certs issue:
With an external etcd cluster, the external cluster is in control of the server certs. If it requires the certs to be in our config, then this version of etcd does require authentication (or this version of k8s does), which is honestly the right thing to do. I was figuring we could continue to use it without auth and use the puppet certs like we used to.

The only way to ensure the auth works like this without distributing certs to the etcd servers (which I'd rather do via puppet than one more manually copied cert) is if we use puppet to generate the cert for the shared client cert. We'd just have to use the puppet CA and openssl to do that (storing the result in the local labs-private). Then we shouldn't need to copy any certs EXCEPT the etcd client cert. That we'd have to copy to all k8s masters. If the current running k8s master authenticates in toolforge (and I highly doubt it does), it would be using it's puppet host cert as a client cert. That won't work with multiple masters.

The only other way I see doing this is if we spin up a separate etcd cluster using kubeadm (which would be containerized) rather than using the packages. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/ <-- mind you, the method is not much better than creating a client cert using the puppet CA. That's a lot of steps and still requires copying certs.

I'm guessing based on your patch that it doesn't work without specifying the etcd client cert.

Wait, does it work without the client cert matching on every node (reading it more)? I was expecting it to add that client cert to a configuration map in kubernetes, and I was worried it wouldn't work if they were different. Maybe it doesn't matter because it just stores the location and all of them are valid individually :) That'd be awesome!

I went ahead to try and answer my own questions and noticed a problem. We have etcd cert authentication enabled, and that cert is the puppet cert of the etcd server we've spun up.
from /etc/default/etcd

ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.pem"
ETCD_PEER_CERT_FILE="/etc/etcd/ssl/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem"
ETCD_PEER_KEY_FILE="/etc/etcd/ssl/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.priv"
ETCD_PEER_CLIENT_CERT_AUTH=true

That won't work since the other peers will have a different cert, if we use puppet certs.
I also see this:
ETCD_CLIENT_CERT_AUTH=false which means the cert mentioned in the kubeadm config doesn't do a thing.

Wait, no. That'll work or the peer cert file as long as that ca.pem is the puppet one. Never mind :-p

I'm curious about testing functioning if it doesn't use the client cert from kubernetes, though. Will poke at it a bit.

Finally figured out how to query this version of etcd: ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 get / --prefix --keys-only

Anyway, since etcd (in this state) isn't even checking the client cert, what I was curious about really doesn't matter, and I think the whole setup will scale fine. Maybe we should test it, though? It'll break using etcdctl with explicit endpoints outside of localhost. That said, I'm going to maybe put up a patch that allows localhost through the firewall. *if* we do decide to validate client certs (which is best practice), then localhost is the only place where etcdctl will work (unless that needs a cert too...which it might with that in place?)

Changed my mind on that last bit because you can specify certs with etcdctl :) No need to skip ssl whether we enable the verification or not.

Change 523220 merged by Bstorm:
[operations/puppet@production] toolforge: k8s: kubeadm: now using external etcd servers

https://gerrit.wikimedia.org/r/523220

Going to try to bootstrap the other two cluster nodes using the --upload-certs thing since I haven't tried it myself yet :)

It does *not* work as merged.

root@toolsbeta-test-k8s-master-2:~# kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443  --token m7uakr.ern5lmlpv7gnkacw --control-plane --discovery-token-ca-cert-hash sha256:<hash> --certificate-key <key-from-upload-certs-command>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: the Secret does not include the required certificate or key - name: external-etcd.key, path: /etc/kubernetes/pki/puppet_toolsbeta-test-k8s-master-1.toolsbeta.eqiad.wmflabs.priv

The nodes would all need the same puppet cert. Since the etcd cluster isn't checking client certs anyway, we should attempt to remove the cert portions of the config. Lemme try that.

I added a command to quickly get the ca-cert-hash, btw, in the wiki page of notes.

Without it I see this problem with calico:

# kubectl logs calico-node-qk4kf --namespace=kube-system 
2019-07-15 21:49:12.134 [INFO][9] startup.go 256: Early log level set to info
2019-07-15 21:49:12.134 [INFO][9] startup.go 272: Using NODENAME environment for node name
2019-07-15 21:49:12.134 [INFO][9] startup.go 284: Determined node name: toolsbeta-test-k8s-master-1
2019-07-15 21:49:12.136 [INFO][9] k8s.go 228: Using Calico IPAM
2019-07-15 21:49:12.136 [INFO][9] startup.go 316: Checking datastore connection
2019-07-15 21:49:12.146 [WARNING][9] startup.go 328: Connection to the datastore is unauthorized
2019-07-15 21:49:12.146 [WARNING][9] startup.go 1057: Terminating
Calico node failed to start

That may be due to old config maps. I might try re-initializing etcd.

Reset etcd with ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 del "" --from-key=true as normal user.

Victory! Now I'll try to join another control plane node.

# kubectl get nodes
NAME                          STATUS   ROLES    AGE     VERSION
toolsbeta-test-k8s-master-1   Ready    master   7m24s   v1.15.0
toolsbeta-test-k8s-master-2   Ready    master   3m46s   v1.15.0
toolsbeta-test-k8s-master-3   Ready    master   2m55s   v1.15.0

Pushing a patch. This works great.

Change 523328 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: kubeadm master nodes shouldn't use client certs for etcd

https://gerrit.wikimedia.org/r/523328

The alternative is, obviously, to use a client cert that is held in common by all the nodes (with each of their names on it) and turn on client cert checking. That cert can be made using the puppet cert generate command with the --dns_alt_names option including the names of all three master nodes. I tested that process in another project. It's kind of weird (puts the resulting files in /var/lib/puppet/ssl/server/ where it keeps the original master certs made during bootstrap, but it didn't seem to break anything where I tested it). I can't say I like it, but it might be good. I mean, with access to the puppetmaster, one can also just use the openssl CLI to make and sign a cert for this that will be trusted by etcd.

NOTE: puppet is disabled on master-1 where I was livehacking--for when you try things in your morning. Feel free to re-enable and mess with things, of course. I didn't un-hack anything on the puppetmaster itself

Change 523328 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: kubeadm master nodes shouldn't use client certs for etcd

https://gerrit.wikimedia.org/r/523328

update, after merging the last patch, trying to check the lifecycle of the control plane:

root@toolsbeta-test-k8s-master-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[...] ok
root@toolsbeta-test-k8s-master-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/calico.yaml 
configmap/calico-config created

root@toolsbeta-test-k8s-master-2:~#   kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token <token from kubeadm-init.yaml> --discovery-token-ca-cert-hash sha256:<sha> --experimental-control-plane --certificate-key <sha>
[..] ok                               ^^^ this cmdline was actually generated to stdout by the first kubeadm init call

root@toolsbeta-test-k8s-master-1:~# kubeadm init phase upload-certs --upload-certs
[..] ok

root@toolsbeta-test-k8s-master-2:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
a3e44c3029260ea6163596085d7361b062be732b644fcddf4e4294c96c4ac4fc

root@toolsbeta-test-k8s-master-3:~# kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token <token from kubeadm-init.yaml> --discovery-token-ca-cert-hash sha256:<sha from previous step> --experimental-control-plane --certificate-key <sha from upload-certs>
Flag --experimental-control-plane has been deprecated, use --control-plane instead
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: the Secret does not include the required certificate or key - name: external-etcd.crt, path: 

So it seems kubeadm is still somehow confused about etcd requiring client certs.

No, that seems more like etcd needs cleanup to me.

Change 523716 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: put the client certs back in for etcd

https://gerrit.wikimedia.org/r/523716

Change 523723 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-etcd: enable client cert checking

https://gerrit.wikimedia.org/r/523723

Change 523726 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: Switch up using etcd client certs in k8s a little

https://gerrit.wikimedia.org/r/523726

Change 523726 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: Switch up using etcd client certs in k8s a little

https://gerrit.wikimedia.org/r/523726

Change 523746 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-etcd: tell etcd to check client certs

https://gerrit.wikimedia.org/r/523746

Change 523723 abandoned by Bstorm:
toolforge-etcd: enable client cert checking

Reason:
Superseded by 523746

https://gerrit.wikimedia.org/r/523723

Change 523716 abandoned by Bstorm:
toolforge: put the client certs back in for etcd

Reason:
We found it works great to just use puppet certs with a couple changes

https://gerrit.wikimedia.org/r/523716

Change 523746 merged by Bstorm:
[operations/puppet@production] toolforge-etcd: tell etcd to check client certs

https://gerrit.wikimedia.org/r/523746

aborrero added a comment.EditedJul 16 2019, 4:30 PM

We just tested the lifecycle again, and it seems to work:

root@toolsbeta-test-k8s-master-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[...]
root@toolsbeta-test-k8s-master-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/calico.yaml
[...]

For other control plane nodes:

root@toolsbeta-test-k8s-master-1:~# kubeadm --config /etc/kubernetes/kubeadm-init.yaml init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
0e323a45a4212c78994e30f8f3b9a6f77a1b475e696e12e7bf5f7cbd72ea5871
root@toolsbeta-test-k8s-master-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
3637ded9d0ac4e45952214e43b3107055d090ea0c13a176c4607f907662034f1

root@toolsbeta-test-k8s-master-2:~# kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output> --experimental-control-plane --certificate-key <upload_certs_output>
[...]

For worker nodes:

aborrero@toolsbeta-test-k8s-worker-1:~ $ sudo kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output>

Note that:

  • deleting a node requires kubectl delete node <nodename (case of VM deletion), adding a node requires the steps outlined above.
  • we use puppet certs for the etcd client connection
  • we enforce client certs on etcd server side

I went ahead and tried this:

root@toolsbeta-test-k8s-master-1:~# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.15.0
[upgrade/versions] kubeadm version: v1.15.0
[upgrade/versions] Latest stable version: v1.15.0
[upgrade/versions] Latest version in the v1.15 series: v1.15.0

Awesome, you're up-to-date! Enjoy!

So basically, we are at the latest still. The docs say kubeadm can be used to downgrade, but it provides no guidance and the tooling seems...not so good for that. If we want to test upgrading for whatever reason, which seems like a much more straightforward process than most of what we've done, we'd need to deploy a cluster with v1.14.4, then upgrade to v1.15.0. Kubeadm upgrade behaves differently in the 1.15 series, though. It refreshes all node certs as it upgrades, so it will not necessarily predict how upgrades will behave in the future. I suspect we may be better off trying out upgrades in beta when a new release happens (1.15.1).

I say that partly because we have a lot of work to do to get this "toolforge ready" now that we've got a handle on a process for kubeadm itself.

Mentioned in SAL (#wikimedia-cloud) [2019-07-17T09:13:42Z] <arturo> create VM toolsbeta-test-k8s-master-4 (Debian Buster) T215531

Mentioned in SAL (#wikimedia-cloud) [2019-07-17T09:51:30Z] <arturo> re-create VM toolsbeta-test-k8s-worker-1 as Debian Buster T215531

Change 524281 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: include the kubeadm_docker_service

https://gerrit.wikimedia.org/r/524281

Change 524281 merged by Bstorm:
[operations/puppet@production] toolforge: include the kubeadm_docker_service

https://gerrit.wikimedia.org/r/524281

Ok, the cluster is now using PSP on init, and it works fine. I have no idea what caused our problem before, but a clean rebuild works great.

Since this works perfectly now (for whatever reason--I have theories that don't ultimately matter much now), the final form of the build process now looks like this:

We just tested the lifecycle again, and it seems to work:

root@toolsbeta-test-k8s-master-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[...]
root@toolsbeta-test-k8s-master-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config

Right here, before calico, you need to run:

kubectl apply -f /etc/kubernetes/kubeadm-system-psp.yaml

That will bring the admin pods online and allow calico to spin up as well. No other pods will be permitted unless they are in kube-system until we add another manifest to handle the toolforge pods. That's the topic of T227290, though.

root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/calico.yaml
[...]

For other control plane nodes:
```lang=shell-session
root@toolsbeta-test-k8s-master-1:~# kubeadm --config /etc/kubernetes/kubeadm-init.yaml init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
0e323a45a4212c78994e30f8f3b9a6f77a1b475e696e12e7bf5f7cbd72ea5871
root@toolsbeta-test-k8s-master-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
3637ded9d0ac4e45952214e43b3107055d090ea0c13a176c4607f907662034f1
root@toolsbeta-test-k8s-master-2:~# kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output> --experimental-control-plane --certificate-key <upload_certs_output>
[...]

For worker nodes:

aborrero@toolsbeta-test-k8s-worker-1:~ $ sudo kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output>

Note that:

  • deleting a node requires kubectl delete node <nodename (case of VM deletion), adding a node requires the steps outlined above.
  • we use puppet certs for the etcd client connection
  • we enforce client certs on etcd server side

Huge progress :)

Change 524310 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: remove class redeclaration

https://gerrit.wikimedia.org/r/524310

Bstorm added a comment.EditedThu, Jul 18, 7:35 PM

To explain this patch and the one where I changed the docker service class:
The docker service class being left out of master since it was easy to forget. I made it an include at the module level (to make the module functional and internally consistent) instead of declaring it in class context in the profile. Separating it out like that is how we manage roles to keep them flexible (which I get), but doing it at the module level makes modules require unusual quirks and insider knowledge just to make them work. Modules are developed elsewhere with a primary init.pp gateway that accepts all options, with most else configured by that interface. I'm fine not using the init pattern in modules, but I'd rather not make it more confusing as well by splitting it out too much.

I'm open to discussion, but I am changing the node profile so that it will work (what I did broke the node profile...but not the master one because it was forgotten there). That's just so it isn't left in a broken state because of how I changed it. I caught the missing material because of warnings during the init preflight phase about the docker config being missing. --So you don't think I'm just being picky or weird about it @aborrero :)

Change 524310 merged by Bstorm:
[operations/puppet@production] toolforge: remove class redeclaration

https://gerrit.wikimedia.org/r/524310

ok, works for me :-)

Change 525112 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: k8s: kubadm: calico requires ipset

https://gerrit.wikimedia.org/r/525112

Change 525112 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: k8s: kubadm: calico requires ipset

https://gerrit.wikimedia.org/r/525112

Change 525339 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: set kubeadm to use internal registry for pause container

https://gerrit.wikimedia.org/r/525339

Mentioned in SAL (#wikimedia-cloud) [2019-07-24T20:48:19Z] <bstorm_> rebuilt toolsbeta-test cluster with the internal version of the pause container T228887 T215531

Change 525339 merged by Bstorm:
[operations/puppet@production] toolforge: set kubeadm to use internal registry for pause container

https://gerrit.wikimedia.org/r/525339

Change 525434 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: add internal pause container to all the other kubelets

https://gerrit.wikimedia.org/r/525434

Change 525434 merged by Bstorm:
[operations/puppet@production] toolforge: add internal pause container to all the other kubelets

https://gerrit.wikimedia.org/r/525434

Change 525436 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: fix typo kubelet file content

https://gerrit.wikimedia.org/r/525436

In the end this works, however, only the init config and presumably a join config file accept the new pause container gracefully. The other control plane nodes (which cannot use a config) require this to be appended to the end of the mess. Luckily, later options overwrite earlier ones, so as soon as the node reboots (or docker & kubelet restart), it works regardless of having two conflicting CLI args on the kubelet command. This works, though, and it is consistent. The only design difference we could do in future might be to use a join config for non-control-plane nodes.

Change 525436 merged by Bstorm:
[operations/puppet@production] toolforge: fix typo kubelet file content

https://gerrit.wikimedia.org/r/525436

Ok, great news, we can try a kubeadm upgrade now.

# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.15.0
[upgrade/versions] kubeadm version: v1.15.0
[upgrade/versions] Latest stable version: v1.15.1
[upgrade/versions] Latest version in the v1.15 series: v1.15.1

External components that should be upgraded manually before you upgrade the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT   AVAILABLE
Etcd        3.2.26    3.3.10

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     5 x v1.15.0   v1.15.1

Upgrade to the latest version in the v1.15 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.15.0   v1.15.1
Controller Manager   v1.15.0   v1.15.1
Scheduler            v1.15.0   v1.15.1
Kube Proxy           v1.15.0   v1.15.1
CoreDNS              1.3.1     1.3.1

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.15.1

Note: Before you can perform this upgrade, you have to update kubeadm to v1.15.1.

_____________________________________________________________________

We should not be required to upgrade etcd, but it will probably tells us about any time we do this. Since this is a great testing opportunity, I'm running it.

Interestingly (but not surprisingly), it asks that first we upgrade kubeadm.

# kubeadm upgrade apply v1.15.1
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/version] You have chosen to change the cluster version to "v1.15.1"
[upgrade/versions] Cluster version: v1.15.0
[upgrade/versions] kubeadm version: v1.15.0
[upgrade/version] FATAL: the --version argument is invalid due to these errors:

        - Specified version to upgrade to "v1.15.1" is higher than the kubeadm version "v1.15.0". Upgrade kubeadm first using the tool you used to install kubeadm

Can be bypassed if you pass the --force flag

To test upgrade, first we'll have to update that from upstream (though it might work with --force). As is, this will still install kubernetes 1.15.0 on kubeadm init because of our config even if we update kubeadm.

@aborrero if you are bored with fighting with the ingress for a bit and want to test this, we just have to update our repo from upstream...however that is done :) I presume that isn't terribly hard? It's not a requirement for this whole thing, but it would be very good to know how "bad" upgrades will be.

Mentioned in SAL (#wikimedia-operations) [2019-07-25T11:03:19Z] <arturo> update stretch-wikimedia/thirdparty/kubeadm-k8s on install1002 for T215531 (kubeadm 1.15.1)

@Bstorm here you go:

aborrero@toolsbeta-test-k8s-master-1:~$ apt-cache policy kubeadm
kubeadm:
  Installed: 1.15.0-00
  Candidate: 1.15.1-00
  Version table:
     1.15.1-00 1001
       1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/thirdparty/kubeadm-k8s amd64 Packages
 *** 1.15.0-00 100
        100 /var/lib/dpkg/status

Just recording the process as I go here:

# apt install kubeadm
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libopts25 libpcsclite1 python3-debconf
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  cri-tools
The following packages will be upgraded:
  cri-tools kubeadm
2 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
Need to get 17.0 MB of archives.
After this operation, 2,250 kB disk space will be freed.
Do you want to continue? [Y/n] 
Get:1 http://apt.wikimedia.org/wikimedia stretch-wikimedia/thirdparty/kubeadm-k8s amd64 cri-tools amd64 1.13.0-00 [8,776 kB]
Get:2 http://apt.wikimedia.org/wikimedia stretch-wikimedia/thirdparty/kubeadm-k8s amd64 kubeadm amd64 1.15.1-00 [8,247 kB]
Fetched 17.0 MB in 1s (32.4 MB/s)
(Reading database ... 57148 files and directories currently installed.)
Preparing to unpack .../cri-tools_1.13.0-00_amd64.deb ...
Unpacking cri-tools (1.13.0-00) over (1.12.0-00) ...
Preparing to unpack .../kubeadm_1.15.1-00_amd64.deb ...
Unpacking kubeadm (1.15.1-00) over (1.15.0-00) ...
Setting up cri-tools (1.13.0-00) ...
Setting up kubeadm (1.15.1-00) ...
# kubeadm upgrade apply v1.15.1
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/version] You have chosen to change the cluster version to "v1.15.1"
[upgrade/versions] Cluster version: v1.15.0
[upgrade/versions] kubeadm version: v1.15.1
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]:

And I confirmed:

[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler]
[upgrade/prepull] Prepulling image for component kube-scheduler.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.15.1"...
Static pod: kube-apiserver-toolsbeta-test-k8s-master-1 hash: e7a689bf231e30af59efcb56690b440d
Static pod: kube-controller-manager-toolsbeta-test-k8s-master-1 hash: 389fff2e2e6c803f828653a4f18c838f
Static pod: kube-scheduler-toolsbeta-test-k8s-master-1 hash: 31d9ee8b7fb12e797dc981a8686f6b2b
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests422342376"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-07-25-14-47-08/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-toolsbeta-test-k8s-master-1 hash: e7a689bf231e30af59efcb56690b440d
Static pod: kube-apiserver-toolsbeta-test-k8s-master-1 hash: 81e3015017da0b319ec4e8fce4116aae
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-07-25-14-47-08/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-toolsbeta-test-k8s-master-1 hash: 389fff2e2e6c803f828653a4f18c838f
Static pod: kube-controller-manager-toolsbeta-test-k8s-master-1 hash: 645e7a8519364c082c136bba3c26849b
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-07-25-14-47-08/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-toolsbeta-test-k8s-master-1 hash: 31d9ee8b7fb12e797dc981a8686f6b2b
Static pod: kube-scheduler-toolsbeta-test-k8s-master-1 hash: ecae9d12d3610192347be3d1aa5aa552
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.15" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.15.1". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
root@toolsbeta-test-k8s-master-1:~#

After that, the kubelets are, of course, not yet upgraded:

# kubectl get nodes -o wide
NAME                          STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION   CONTAINER-RUNTIME
toolsbeta-test-k8s-master-1   Ready    master   18h   v1.15.0   172.16.2.223   <none>        Debian GNU/Linux 10 (buster)   4.19.0-5-amd64   docker://18.9.7
toolsbeta-test-k8s-master-2   Ready    master   18h   v1.15.0   172.16.2.225   <none>        Debian GNU/Linux 10 (buster)   4.19.0-5-amd64   docker://18.9.7
toolsbeta-test-k8s-master-3   Ready    master   17h   v1.15.0   172.16.2.233   <none>        Debian GNU/Linux 10 (buster)   4.19.0-5-amd64   docker://18.9.7
toolsbeta-test-k8s-worker-1   Ready    <none>   18h   v1.15.0   172.16.2.227   <none>        Debian GNU/Linux 10 (buster)   4.19.0-5-amd64   docker://18.9.7
toolsbeta-test-k8s-worker-2   Ready    <none>   18h   v1.15.0   172.16.2.231   <none>        Debian GNU/Linux 10 (buster)   4.19.0-5-amd64   docker://18.9.7

And the effect of it:

root@toolsbeta-test-k8s-master-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
root@toolsbeta-test-k8s-master-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
root@toolsbeta-test-k8s-master-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

So we can see that it only updated the control plane node that it ran on.

It is necessary and documented for HA clusters that you must go to the other nodes directly to run the following:

root@toolsbeta-test-k8s-master-2:~# kubeadm upgrade node 
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Upgrading your Static Pod-hosted control plane instance to version "v1.15.1"...
Static pod: kube-apiserver-toolsbeta-test-k8s-master-2 hash: 7c5b672d7da21ab872a88c8feec039ea
Static pod: kube-controller-manager-toolsbeta-test-k8s-master-2 hash: 389fff2e2e6c803f828653a4f18c838f
Static pod: kube-scheduler-toolsbeta-test-k8s-master-2 hash: 31d9ee8b7fb12e797dc981a8686f6b2b
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests191401975"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-07-25-15-07-48/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-toolsbeta-test-k8s-master-2 hash: 7c5b672d7da21ab872a88c8feec039ea
Static pod: kube-apiserver-toolsbeta-test-k8s-master-2 hash: 17c3be5ae16d141c9a5708dfc1a87b8e
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-07-25-15-07-48/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-toolsbeta-test-k8s-master-2 hash: 389fff2e2e6c803f828653a4f18c838f
Static pod: kube-controller-manager-toolsbeta-test-k8s-master-2 hash: 645e7a8519364c082c136bba3c26849b
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-07-25-15-07-48/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-toolsbeta-test-k8s-master-2 hash: 31d9ee8b7fb12e797dc981a8686f6b2b
Static pod: kube-scheduler-toolsbeta-test-k8s-master-2 hash: ecae9d12d3610192347be3d1aa5aa552
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upgrade] The control plane instance for this node was successfully updated!
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

Note that you no longer need to specify "control-plane" or "experimental-control-plane" because that is a phase of the command by default in version 1.15+. If there are control plane pods, it upgrades them.

Now upgrading the package side of things in general on the control plane nodes one at a time. This brings up an interesting point. We should pin or hold the packages at a particular version until we are ready to upgrade in the future, possibly keying off the value from our kubeadm config to set those things.

If the specific packages that are in our repo are manually controlled, perhaps there's no need to mess with it in puppet/apt, though 😁

Change 525569 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: Update the version string to match our software

https://gerrit.wikimedia.org/r/525569

root@toolsbeta-test-k8s-master-1:~# kubectl get nodes
NAME                          STATUS   ROLES    AGE   VERSION
toolsbeta-test-k8s-master-1   Ready    master   19h   v1.15.1
toolsbeta-test-k8s-master-2   Ready    master   19h   v1.15.1
toolsbeta-test-k8s-master-3   Ready    master   18h   v1.15.1
toolsbeta-test-k8s-worker-1   Ready    <none>   18h   v1.15.0
toolsbeta-test-k8s-worker-2   Ready    <none>   18h   v1.15.0
root@toolsbeta-test-k8s-master-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
root@toolsbeta-test-k8s-master-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
root@toolsbeta-test-k8s-master-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

After that, it's just kubelet upgrades for the worker nodes. That should be done with drains to minimize disruption. Overall, that makes for a procedure we can document. Naturally, the process for upgrading between major versions is more involved, but the documented upgrades in the official docs are remarkably similar to this procedure, which is good to see.

Change 525569 merged by Bstorm:
[operations/puppet@production] toolforge: Update the version string to match our software

https://gerrit.wikimedia.org/r/525569

Fully thing, a lot of what is fixed in 1.15.1 is the things that annoyed us about etcd and kubeadm for an HA stacked control plane: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#changelog-since-v1150

One notable thing about the upgrade process as well: it rotates the certificates so they don't expire. Renewing all the certs is an often cited issue. If we keep up with upgrades, we honestly will never have to worry about it.

If the specific packages that are in our repo are manually controlled, perhaps there's no need to mess with it in puppet/apt, though 😁

This approach may not work well once we have N clusters (toolsbeta, tools, anything in codfw that we might add for additional testing) and want to practice an upgrade on clusterA without needing to freeze apt upgrades or capacity expansion in clustersB...N. As long as we are using an apt repo with support for multiple versions of the same package (I think aptly has this restriction?) then pining or explicit versioning in the Puppet manifests should let us run version n+1 in a test cluster without breaking the use of version n in other clusters.

Bstorm added a comment.EditedThu, Jul 25, 7:58 PM

This is true. We are using reprepro, not aptly for packages. I have no idea if we can support multiple package versions in that. The kubernetes API version will not upgrade until told to via kubeadm, but the kubelet must be upgraded by hand (which is what the pinning affects--and the updates are not done by puppet though a new node build would be affected by package changes). As is, we have the version as a configurable field that can be hiera'd for kubeadm init. After init, it makes no difference unless we then also use it to manage the package versions (and kubelet version isn't managed by kubeadm).

Overall, it boils down the question: is it possible to have multiple versions in reprepro or not.

Overall, it boils down the question: is it possible to have multiple versions in reprepro or not.

Yes! it is possible :-)

We have several ways of doing it, but the easier I would say is to just create versioned repo components.

Currently we have:

  • stretch-wikimedia/thirdparty/kubeadm-k8s

We could move to:

  • stretch-wikimedia/thirdparty/kubeadm-k8s-1.15

Anyway I suggest we create another task to discuss the details.

Change 519375 abandoned by Arturo Borrero Gonzalez:
k8s: kubelet: stop requiring ::k8s::infrastructure_config

Reason:
Not following this approach anymore.

https://gerrit.wikimedia.org/r/519375