Page MenuHomePhabricator

Upgrade the nginx ingress controller in Toolforge (and likely PAWS)
Open, HighPublic

Description

We are currently using controller version 0.25.0, which is quite outdated and has been improved on significantly since then. A careful and tested upgrade would be a good idea to stay up to date with security and bug-fix patches.

In addition to that, the helm chart is now embraced in the upstream repo. Using the helm chart would simplify some aspects of upgrade management (upgrades and rollbacks are made quite clear), remove a lot of boilerplate from our yaml, and possibly add a set of settings for local testing in the future. The only changes that would require is:

  • installing helm 3 from our repos to the control plane servers
  • swapping out the yaml files in puppet with yaml files that are strictly our override values (basically, a values.yaml with a more expressive name)

For PAWS, changing to using the helm chart is even easier, if the ingress controller is made an optional dependency and the values are added to the existing values and secrets yaml files. I will definitely propose a patch for this latter case because it's very easy to not include that in minikube deploy while deploying our setup to PAWS prod.

Event Timeline

Change 631421 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepro: thirdparty/kubeadm-k8s-1-17: introduce helm3 package

https://gerrit.wikimedia.org/r/631421

Quick info on at least my reading of the very long changelog between current and our version: T263284#6506145

Change 631421 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: thirdparty/kubeadm-k8s-1-17: introduce helm3 package

https://gerrit.wikimedia.org/r/631421

Change 631711 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepo: add gpg key for baltocdn external repository

https://gerrit.wikimedia.org/r/631711

Change 631711 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: add gpg key for baltocdn external repository

https://gerrit.wikimedia.org/r/631711

Change 631712 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepo: updates: fix baltocdn key id

https://gerrit.wikimedia.org/r/631712

Mentioned in SAL (#wikimedia-operations) [2020-10-02T09:11:49Z] <arturo> added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T264221)

Change 631712 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: updates: fix baltocdn key id

https://gerrit.wikimedia.org/r/631712

Andrew triaged this task as High priority.Jan 12 2021, 5:08 PM

Marked this as a blocker of the 1.18 upgrade per T263284#6506145. We have another 10 versions that have been released, changelog summary:

  • Bunch of nginx/alpine updates
  • Tons of bug fixes
  • 0.40 changes some default settings, including use-gzip, and others
  • 0.40: Only ValidatingWebhookConfiguration AdmissionReviewVersions v1 is supported.
  • What happened to versions between 0.35 and 0.40? Changelog is missing those

Version support wide we should be good to just go to the latest version, 0.40 bumped the minimum supported kubernetes to 1.16.

The normal use case (created by webservice) should be fairly easily testable on toolsbeta, but I don't think it would be a bad idea to somehow get the list of "custom" ingresses and quickly check that they appear to be working.

Looking at our customizations (assuming all had comments in the yaml) and the Helm chart:

  • nginx-configuration ConfigMap values: easily doable
  • custom repo/image easily doable
  • pod security policy easily doable
  • disable https on pod, custom ports for nodeport: easily doable
  • prometheus ports: easily doable
  • custom default backend: not sure
  • replicas: easily doable
  • change replicas to a template: not sure if relevant
  • affinity and tolerations for ingress nodes: easily doable

Few notes/questions:

  • what is docker.io/jettech/kube-webhook-certgen? looks like some webhook uses that image, not sure what for
  • how/where will the helm values be stored? will it require always using one server for deploys/rollbacks? or a git repo?
  • How can we convert to this from our current setup with minimal/zero downtime?
  • How can we ensure the security of third-party charts? Changes to those aren't reviewed in Gerrit like changes to the currently used yaml file
  • what is docker.io/jettech/kube-webhook-certgen? looks like some webhook uses that image, not sure what for

This would likely not be used. We don't terminate tls at the ingress because we aren't using certmanager so far. We are using acmechief.

It appears to be a testing, self-signed version of certmanager: https://github.com/jet/kube-webhook-certgen

  • how/where will the helm values be stored? will it require always using one server for deploys/rollbacks? or a git repo?

A values file distributed by puppet is all that is needed for that. Helm 3 uses your k8s credentials...so anywhere kubectl works, helm works. PAWS == helm so we are used to it mostly. The real work there is the documentation.

  • How can we convert to this from our current setup with minimal/zero downtime?

Once we are on helm: that's what it does best. Before then, it's all a deployment already, so it is capable of rolling changes. Everything is also load balanced. We can be 100% certain there are no clashes by creating three new ingress nodes with a slightly different taint/label, deploy via helm, test and then redirect the load balancers via puppet. That procedure can be tested in toolsbeta.

  • How can we ensure the security of third-party charts? Changes to those aren't reviewed in Gerrit like changes to the currently used yaml file

They aren't reviewed in the k8s source code really either. Upgrading the helm chart requires effort and won't happen automatically. This is maintained by the k8s project, so it's definitely in our sphere of trust in Toolforge. We just need to keep an eye on it. PAWS is a third party chart deployed as a dependency of a chart we control in a git repo. We trust the Jupyterhub project because we know lots of the people on it. It can definitely gather problems as well, so we have to watch it. If we wanted to, we could fork it into the chartmuseum for production, but that would add an additional step without much additional review. Ultimately, we use the upstream images already (cached for rate-limiting reasons if they are on docker hub...not for most of k8s because that's not rate-limited), so it mostly just requires deliberation and care before taking an upgrade action, similar to this task.

Change 685715 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Add ingress-nginx Helm files

https://gerrit.wikimedia.org/r/685715

We can be 100% certain there are no clashes by creating three new ingress nodes with a slightly different taint/label, deploy via helm, test and then redirect the load balancers via puppet

That sounds like a good idea, we can take care of the ingress nodes rebuild in T267082 as well at the same time

Mentioned in SAL (#wikimedia-cloud) [2021-05-08T10:42:23Z] <Majavah> create new ingress node toolsbeta-k8s-ingress-3 T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-08T10:57:44Z] <Majavah> import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221

I have the ingress controller v0.46.0 as deployed by Helm running mostly fine locally. There are still a things to deal with (notably 404 handling and verifying the admission controller works), I'll likely deploy it to the new toolsbeta node either tomorrow or early next week.

I'm still unable to get the admission webhook working with v1 api, but the v1beta1 api still appears to be working with the ingress controller v0.46.0. Managed to find a spot where to put the default backend too, so I think it's ready for toolsbeta.

I just deployed the new ingress controller to toolsbeta:

root@toolsbeta-test-k8s-control-4:~# kubectl create ns ingress-nginx-gen2
namespace/ingress-nginx-gen2 created
root@toolsbeta-test-k8s-control-4:~# helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
"ingress-nginx" has been added to your repositories
root@toolsbeta-test-k8s-control-4:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ingress-nginx" chart repository
Update Complete. ⎈Happy Helming!⎈
root@toolsbeta-test-k8s-control-4:~# helm install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values /etc/kubernetes/nginx-ingress-helm.yaml
coalesce.go:200: warning: cannot overwrite table with non table for extraArgs (map[])
NAME: ingress-nginx-gen2
LAST DEPLOYED: Mon May 10 12:02:17 2021
NAMESPACE: ingress-nginx-gen2
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
[... instructions how to create an ingress object cut ...]

root@toolsbeta-test-k8s-control-4:~# fix things and update the chart values file

root@toolsbeta-test-k8s-control-4:~# helm upgrade --install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values /etc/kubernetes/nginx-ingress-helm.yaml
Release "ingress-nginx-gen2" has been upgraded. Happy Helming!
[... instructions how to create an ingress object cut ...]
root@toolsbeta-test-k8s-control-4:~# kubectl get all -n ingress-nginx-gen2
NAME                                                 READY   STATUS    RESTARTS   AGE
pod/ingress-nginx-gen2-controller-7b7767dbc4-wbntx   1/1     Running   0          34s
pod/ingress-nginx-gen2-controller-7b7767dbc4-xj5rg   1/1     Running   0          46s

NAME                                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/ingress-nginx-gen2-controller           NodePort    10.104.11.244    <none>        80:30002/TCP   2m33s
service/ingress-nginx-gen2-controller-metrics   ClusterIP   10.111.173.105   <none>        10254/TCP      2m33s

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ingress-nginx-gen2-controller   2/2     2            2           2m33s

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/ingress-nginx-gen2-controller-7b7767dbc4   2         2         2       46s
replicaset.apps/ingress-nginx-gen2-controller-7f5bb9bd67   0         0         0       2m33s

[taavi@toolsbeta-bastion-05 ~] $ curl -H "Host: test.toolforge.org" http://toolsbeta-test-k8s-ingress-3:30002
Hello World, from Toolsbeta!
[taavi@toolsbeta-bastion-05 ~] $ curl -H "Host: nonexistent.toolforge.org" http://toolsbeta-test-k8s-ingress-3:30002
[...]
      <a id="source" href="https://phabricator.wikimedia.org/source/tool-fourohfour/">view source</a>
[...]

It appears to work! The one problem I found while deploying was that it needs to be deployed to a separate port compared to the current one, given NodePorts are unique on the cluster. The current puppet code assumes that all ingress nodes have it on the same port and haproxy listens on that one port.

Updated the docs: https://wikitech.wikimedia.org/w/index.php?title=Portal:Toolforge/Admin/Kubernetes/Networking_and_ingress&curid=444438&diff=1911417&oldid=1886320

Related to the port issue, I see a couple of different possible solutions:

  • Undeploy the old ingress controller before deploying the new one. This is the simplest one but requires downtime.
  • Modify puppet to allow specifying the port for an individual backend server. This will have the effect of having the haproxy frontend port and port on kubernetes nodes be different
  • Add separate haproxy frontend for the helm-based ingress on the new port and change dynamicproxy to use that
  • others?

@Bstorm / others, do you have opinions on this? I'd prefer to do this without downtime since it's possible but don't have opinions between the others. Reviews on the patch itself would also be appreciated.

It seems to me that adding something in the template to specify the port for the ingress is the simplest and safest solution. That should just be a profile-based template and adding a parameter with a sane default.

Change 688361 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Allow passing host port for k8s ingress

https://gerrit.wikimedia.org/r/688361

It seems to me that adding something in the template to specify the port for the ingress is the simplest and safest solution. That should just be a profile-based template and adding a parameter with a sane default.

This needs to be in the proxy nodes hiera, but compared to the old list of ingress nodes I now have the following backwards-compatible data structure which works:

profile::toolforge::k8s::ingress_nodes:
- toolsbeta-test-k8s-ingress-1.toolsbeta.eqiad1.wikimedia.cloud
- toolsbeta-test-k8s-ingress-2.toolsbeta.eqiad1.wikimedia.cloud
- host: toolsbeta-test-k8s-ingress-3.toolsbeta.eqiad1.wikimedia.cloud
  port: 30002
  # jobs_port: 30003 if needed

This is now live on toolsbeta and ready for testing. I'm not particularly happy leaving the ports where k8s and haproxy are listening different (30000 vs 30002), but I can live with it.

This is now live on toolsbeta and ready for testing. I'm not particularly happy leaving the ports where k8s and haproxy are listening different (30000 vs 30002), but I can live with it.

This is the sort of thing haproxy is great at, so that's all good to me. I was originally thinking of using a port overridden by the instance's host hiera, but you set up a more general solution that might actually be useful for future upgrades and changes. Seems good.

Verified that the ingress admission controller works (at least with the dual stack setup) and https://wikitech.wikimedia.org/wiki/Tool:Iw can works on the new ingress controller too as an example of a tool with non-default ingress objects.

I believe as long as the used pod security policies and RBAC rules are sensible this is ready for the real Toolforge. I don't think this needs a properly announced deployment window with early warnings and completion confirmations (like we do for outages) but I'd send a heads-up cloud-announce@ email (which I will need rights for on mailman) like "hey, we updated the ingress controller, we think nothing was broken in the process, if you see anything weird please let us know".

Change 685715 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Add ingress-nginx Helm files

https://gerrit.wikimedia.org/r/685715

Change 688361 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Allow passing host port for k8s ingress

https://gerrit.wikimedia.org/r/688361

Mentioned in SAL (#wikimedia-cloud) [2021-05-19T10:44:13Z] <Majavah> create tools-k8s-ingress-[4-6] T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-19T11:09:12Z] <Majavah> deploy helm-based nginx ingress controller v0.46.0 to ingress-nginx-gen2 namespace T264221

Install on Toolforge is currently failing on

root@tools-k8s-control-1:~# helm install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values /etc/kubernetes/nginx-ingress-helm-values.yaml
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe): unknown field "port" in io.k8s.api.core.v1.Probe, ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe): unknown field "port" in io.k8s.api.core.v1.Probe]

Change 692875 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Update helm values after chart changes

https://gerrit.wikimedia.org/r/692875

Change 692875 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Update helm values after chart changes

https://gerrit.wikimedia.org/r/692875

Things are constantly failing with errors like:

2021/05/19 12:14:56 [error] 113#113: *2506 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:56 [error] 113#113: *2506 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for tatsumo.pythonanywhere.com:
failed to receive reply from UDP server 10.96.0.10:53: timeout
failed to receive reply from UDP server 10.96.0.10:53: timeout, context: ngx.timer
2021/05/19 12:14:56 [error] 113#113: *2530 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:56 [error] 113#113: *2555 lua udp socket read timed out, context: ngx.timer


2021/05/19 12:14:57 [error] 112#112: *2940 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:57 [error] 112#112: *2964 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:57 [alert] 112#112: lua failed to run timer with function defined at @/etc/nginx/lua/balancer.lua:141: 256 lua_max_running_timers are not enough
2021/05/19 12:14:57 [alert] 112#112: lua failed to run timer with function defined at @/etc/nginx/lua/monitor.lua:51: 256 lua_max_running_timers are not enough

I'm trying to find some graphs about kube-dns usage during that time but have not yet found any.

Do the VMs have the required security group? In tools, an extra security group that isn't necessary to think about in toolsbeta. It blocks UDP, specifically.

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T15:15:53Z] <Majavah> move tools-k8s-ingress-[5-6] from "tools-k8s-full-connectivity" to "tools-new-k8s-full-connectivity" security group T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T15:17:46Z] <Majavah> trying to install ingress-nginx via helm again after adjusting security groups T264221

Do the VMs have the required security group? In tools, an extra security group that isn't necessary to think about in toolsbeta. It blocks UDP, specifically.

I thought they did, apparently not... thanks.

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T16:31:45Z] <Majavah> pool tools-k8s-worker-4 as an ingress node T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T17:05:45Z] <Majavah> pool tools-k8s-ingress-5 as an ingress node, depool ingress-1 T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-21T06:52:53Z] <Majavah> pool tools-k8s-ingress-6 and depool ingress-[2,3] T264221

Mentioned in SAL (#wikimedia-cloud) [2021-06-04T21:30:13Z] <bstorm> deleting "tools-k8s-ingress-3", "tools-k8s-ingress-2", "tools-k8s-ingress-1" T264221

Mentioned in SAL (#wikimedia-cloud) [2021-06-07T17:37:39Z] <majavah> remove tools-k8s-ingress-[1-3] from kubernetes, follow-up to https://sal.toolforge.org/log/nd7v2HkB1jz_IcWuCX5M T264221

Mentioned in SAL (#wikimedia-cloud) [2021-06-07T17:42:38Z] <majavah> delete ingress-nginx namespace and related objects T264221

Change 698588 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Remove non-helm ingress-nginx files

https://gerrit.wikimedia.org/r/698588

Change 698588 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Remove non-helm ingress-nginx files

https://gerrit.wikimedia.org/r/698588