Upgrade the nginx ingress controller in Toolforge (and likely PAWS)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Bstorm
	Sep 30 2020, 5:53 PM

Description

We are currently using controller version 0.25.0, which is quite outdated and has been improved on significantly since then. A careful and tested upgrade would be a good idea to stay up to date with security and bug-fix patches.

In addition to that, the helm chart is now embraced in the upstream repo. Using the helm chart would simplify some aspects of upgrade management (upgrades and rollbacks are made quite clear), remove a lot of boilerplate from our yaml, and possibly add a set of settings for local testing in the future. The only changes that would require is:

installing helm 3 from our repos to the control plane servers
swapping out the yaml files in puppet with yaml files that are strictly our override values (basically, a values.yaml with a more expressive name)

For PAWS, changing to using the helm chart is even easier, if the ingress controller is made an optional dependency and the values are added to the existing values and secrets yaml files. I will definitely propose a patch for this latter case because it's very easy to not include that in minikube deploy while deploying our setup to PAWS prod.

Details

Subject	Repo	Branch	Lines +/-
toolforge::prometheus: Update PAWS ingress target	operations/puppet	production	+2 -2
paws::haproxy: Allow specifying the port for ingress nodes	operations/puppet	production	+16 -4
toolforge: Remove non-helm ingress-nginx files	operations/puppet	production	+2 -365
toolforge: Update helm values after chart changes	operations/puppet	production	+16 -9
toolforge: Allow passing host port for k8s ingress	operations/puppet	production	+22 -5
toolforge: Add ingress-nginx Helm files	operations/puppet	production	+476 -0
aptrepo: updates: fix baltocdn key id	operations/puppet	production	+1 -1
aptrepo: add gpg key for baltocdn external repository	operations/puppet	production	+29 -0
aptrepo: thirdparty/kubeadm-k8s-1-17: introduce helm3 package	operations/puppet	production	+10 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T362869 [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30)
Open	None	T362868 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29
Open	None	T362867 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28
Open	None	T359641 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27
Open	None	T327025 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26
Open	None	T316107 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.25
Resolved	aborrero	T307651 Upgrade Toolforge Kubernetes to version 1.24
Resolved	• taavi	T298005 Upgrade Toolforge Kubernetes to version 1.23
Resolved	• taavi	T286856 Upgrade Toolforge Kubernetes to latest 1.22
Resolved	rook	T308172 Upgrade PAWS to Kubernetes 1.21
Resolved	• taavi	T282942 Upgrade Toolforge Kubernetes to latest 1.21
Resolved	rook	T291913 Upgrade PAWS kubernetes to 1.20
Resolved	rook	T280402 Upgrade Toolforge Kubernetes to latest 1.20
Resolved	• taavi	T280340 Upgrade Toolforge Kubernetes to latest 1.19
Resolved	• taavi	T280299 Upgrade Toolforge Kubernetes to latest 1.18
Resolved	• taavi	T280302 Upgrade PAWS Kubernetes to the latest 1.18 release
		Restricted Task
Resolved	• Bstorm	T246122 Upgrade the Toolforge Kubernetes cluster to v1.16
Resolved	• Bstorm	T263284 Upgrade Toolforge K8s to 1.17
Resolved	• taavi	T264221 Upgrade the nginx ingress controller in Toolforge (and likely PAWS)
Resolved	• Bstorm	T282087 Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes
Resolved	• taavi	T284353 Fix prometheus monitoring for Toolforge Ingress

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 631421 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepro: thirdparty/kubeadm-k8s-1-17: introduce helm3 package

https://gerrit.wikimedia.org/r/631421

gerritbot added a project: Patch-For-Review.Oct 1 2020, 11:07 AM

Quick info on at least my reading of the very long changelog between current and our version: T263284#6506145

Change 631421 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: thirdparty/kubeadm-k8s-1-17: introduce helm3 package

https://gerrit.wikimedia.org/r/631421

Change 631711 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepo: add gpg key for baltocdn external repository

https://gerrit.wikimedia.org/r/631711

Change 631711 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: add gpg key for baltocdn external repository

https://gerrit.wikimedia.org/r/631711

Change 631712 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] aptrepo: updates: fix baltocdn key id

https://gerrit.wikimedia.org/r/631712

Mentioned in SAL (#wikimedia-operations) [2020-10-02T09:11:49Z] <arturo> added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T264221)

Change 631712 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] aptrepo: updates: fix baltocdn key id

https://gerrit.wikimedia.org/r/631712

• nskaggs moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.Oct 20 2020, 4:32 PM

Andrew triaged this task as High priority.Jan 12 2021, 5:08 PM

• Bstorm added a subscriber: • Phamhi.Mar 19 2021, 6:49 PM

• taavi mentioned this in T280299: Upgrade Toolforge Kubernetes to latest 1.18.Apr 16 2021, 1:39 PM

• taavi added a parent task: T280299: Upgrade Toolforge Kubernetes to latest 1.18.Apr 17 2021, 6:32 AM

Marked this as a blocker of the 1.18 upgrade per T263284#6506145. We have another 10 versions that have been released, changelog summary:

Bunch of nginx/alpine updates
Tons of bug fixes
0.40 changes some default settings, including use-gzip, and others
0.40: Only ValidatingWebhookConfiguration AdmissionReviewVersions v1 is supported.
What happened to versions between 0.35 and 0.40? Changelog is missing those

Version support wide we should be good to just go to the latest version, 0.40 bumped the minimum supported kubernetes to 1.16.

The normal use case (created by webservice) should be fairly easily testable on toolsbeta, but I don't think it would be a bad idea to somehow get the list of "custom" ingresses and quickly check that they appear to be working.

Looking at our customizations (assuming all had comments in the yaml) and the Helm chart:

nginx-configuration ConfigMap values: easily doable
custom repo/image easily doable
pod security policy easily doable
disable https on pod, custom ports for nodeport: easily doable
prometheus ports: easily doable
custom default backend: not sure
replicas: easily doable
change replicas to a template: not sure if relevant
affinity and tolerations for ingress nodes: easily doable

Few notes/questions:

what is docker.io/jettech/kube-webhook-certgen? looks like some webhook uses that image, not sure what for
how/where will the helm values be stored? will it require always using one server for deploys/rollbacks? or a git repo?
How can we convert to this from our current setup with minimal/zero downtime?
How can we ensure the security of third-party charts? Changes to those aren't reviewed in Gerrit like changes to the currently used yaml file

In T264221#7011343, @Majavah wrote:

what is docker.io/jettech/kube-webhook-certgen? looks like some webhook uses that image, not sure what for

This would likely not be used. We don't terminate tls at the ingress because we aren't using certmanager so far. We are using acmechief.

It appears to be a testing, self-signed version of certmanager: https://github.com/jet/kube-webhook-certgen

how/where will the helm values be stored? will it require always using one server for deploys/rollbacks? or a git repo?

A values file distributed by puppet is all that is needed for that. Helm 3 uses your k8s credentials...so anywhere kubectl works, helm works. PAWS == helm so we are used to it mostly. The real work there is the documentation.

How can we convert to this from our current setup with minimal/zero downtime?

Once we are on helm: that's what it does best. Before then, it's all a deployment already, so it is capable of rolling changes. Everything is also load balanced. We can be 100% certain there are no clashes by creating three new ingress nodes with a slightly different taint/label, deploy via helm, test and then redirect the load balancers via puppet. That procedure can be tested in toolsbeta.

How can we ensure the security of third-party charts? Changes to those aren't reviewed in Gerrit like changes to the currently used yaml file

They aren't reviewed in the k8s source code really either. Upgrading the helm chart requires effort and won't happen automatically. This is maintained by the k8s project, so it's definitely in our sphere of trust in Toolforge. We just need to keep an eye on it. PAWS is a third party chart deployed as a dependency of a chart we control in a git repo. We trust the Jupyterhub project because we know lots of the people on it. It can definitely gather problems as well, so we have to watch it. If we wanted to, we could fork it into the chartmuseum for production, but that would add an additional step without much additional review. Ultimately, we use the upstream images already (cached for rate-limiting reasons if they are on docker hub...not for most of k8s because that's not rate-limited), so it mostly just requires deliberation and care before taking an upgrade action, similar to this task.

T280436: Migrate Toolforge admission webhooks to use the admissionregistration.k8s.io/v1 API is a blocker, since 0.40 drops support for AdmissionReviewVersions that are not v1.

• taavi mentioned this in T267082: Rebuild Toolforge servers that should not have NFS mounted (and with affinity).May 6 2021, 7:21 AM

Change 685715 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Add ingress-nginx Helm files

https://gerrit.wikimedia.org/r/685715

We can be 100% certain there are no clashes by creating three new ingress nodes with a slightly different taint/label, deploy via helm, test and then redirect the load balancers via puppet

That sounds like a good idea, we can take care of the ingress nodes rebuild in T267082 as well at the same time

• taavi added a subtask: T282087: Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes.May 6 2021, 7:35 AM

• Bstorm closed subtask T282087: Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes as Resolved.May 7 2021, 7:25 PM

• taavi claimed this task.May 8 2021, 10:38 AM

Restricted Application added a project: User-Majavah. · View Herald TranscriptMay 8 2021, 10:38 AM

Mentioned in SAL (#wikimedia-cloud) [2021-05-08T10:42:23Z] <Majavah> create new ingress node toolsbeta-k8s-ingress-3 T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-08T10:57:44Z] <Majavah> import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221

I have the ingress controller v0.46.0 as deployed by Helm running mostly fine locally. There are still a things to deal with (notably 404 handling and verifying the admission controller works), I'll likely deploy it to the new toolsbeta node either tomorrow or early next week.

I'm still unable to get the admission webhook working with v1 api, but the v1beta1 api still appears to be working with the ingress controller v0.46.0. Managed to find a spot where to put the default backend too, so I think it's ready for toolsbeta.

• taavi moved this task from Unsorted to Working on on the User-Majavah board.May 8 2021, 4:07 PM

Mentioned in SAL (#wikimedia-cloud) [2021-05-10T11:50:15Z] <Majavah> testing ingress-nginx update https://gerrit.wikimedia.org/r/c/operations/puppet/+/685715 on toolsbeta T264221

I just deployed the new ingress controller to toolsbeta:

root@toolsbeta-test-k8s-control-4:~# kubectl create ns ingress-nginx-gen2
namespace/ingress-nginx-gen2 created
root@toolsbeta-test-k8s-control-4:~# helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
"ingress-nginx" has been added to your repositories
root@toolsbeta-test-k8s-control-4:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ingress-nginx" chart repository
Update Complete. ⎈Happy Helming!⎈
root@toolsbeta-test-k8s-control-4:~# helm install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values /etc/kubernetes/nginx-ingress-helm.yaml
coalesce.go:200: warning: cannot overwrite table with non table for extraArgs (map[])
NAME: ingress-nginx-gen2
LAST DEPLOYED: Mon May 10 12:02:17 2021
NAMESPACE: ingress-nginx-gen2
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
[... instructions how to create an ingress object cut ...]

root@toolsbeta-test-k8s-control-4:~# fix things and update the chart values file

root@toolsbeta-test-k8s-control-4:~# helm upgrade --install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values /etc/kubernetes/nginx-ingress-helm.yaml
Release "ingress-nginx-gen2" has been upgraded. Happy Helming!
[... instructions how to create an ingress object cut ...]
root@toolsbeta-test-k8s-control-4:~# kubectl get all -n ingress-nginx-gen2
NAME                                                 READY   STATUS    RESTARTS   AGE
pod/ingress-nginx-gen2-controller-7b7767dbc4-wbntx   1/1     Running   0          34s
pod/ingress-nginx-gen2-controller-7b7767dbc4-xj5rg   1/1     Running   0          46s

NAME                                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/ingress-nginx-gen2-controller           NodePort    10.104.11.244    <none>        80:30002/TCP   2m33s
service/ingress-nginx-gen2-controller-metrics   ClusterIP   10.111.173.105   <none>        10254/TCP      2m33s

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ingress-nginx-gen2-controller   2/2     2            2           2m33s

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/ingress-nginx-gen2-controller-7b7767dbc4   2         2         2       46s
replicaset.apps/ingress-nginx-gen2-controller-7f5bb9bd67   0         0         0       2m33s

[taavi@toolsbeta-bastion-05 ~] $ curl -H "Host: test.toolforge.org" http://toolsbeta-test-k8s-ingress-3:30002
Hello World, from Toolsbeta!
[taavi@toolsbeta-bastion-05 ~] $ curl -H "Host: nonexistent.toolforge.org" http://toolsbeta-test-k8s-ingress-3:30002
[...]
      <a id="source" href="https://phabricator.wikimedia.org/source/tool-fourohfour/">view source</a>
[...]

It appears to work! The one problem I found while deploying was that it needs to be deployed to a separate port compared to the current one, given NodePorts are unique on the cluster. The current puppet code assumes that all ingress nodes have it on the same port and haproxy listens on that one port.

Updated the docs: https://wikitech.wikimedia.org/w/index.php?title=Portal:Toolforge/Admin/Kubernetes/Networking_and_ingress&curid=444438&diff=1911417&oldid=1886320

Related to the port issue, I see a couple of different possible solutions:

Undeploy the old ingress controller before deploying the new one. This is the simplest one but requires downtime.
Modify puppet to allow specifying the port for an individual backend server. This will have the effect of having the haproxy frontend port and port on kubernetes nodes be different
Add separate haproxy frontend for the helm-based ingress on the new port and change dynamicproxy to use that
others?

@Bstorm / others, do you have opinions on this? I'd prefer to do this without downtime since it's possible but don't have opinions between the others. Reviews on the patch itself would also be appreciated.

• taavi reopened subtask T282087: Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes as Open.May 10 2021, 1:21 PM

It seems to me that adding something in the template to specify the port for the ingress is the simplest and safest solution. That should just be a profile-based template and adding a parameter with a sane default.

Mentioned in SAL (#wikimedia-cloud) [2021-05-10T17:43:33Z] <Majavah> testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/688361 in toolsbeta T264221

Change 688361 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Allow passing host port for k8s ingress

https://gerrit.wikimedia.org/r/688361

In T264221#7075468, @Bstorm wrote:

It seems to me that adding something in the template to specify the port for the ingress is the simplest and safest solution. That should just be a profile-based template and adding a parameter with a sane default.

This needs to be in the proxy nodes hiera, but compared to the old list of ingress nodes I now have the following backwards-compatible data structure which works:

profile::toolforge::k8s::ingress_nodes:
- toolsbeta-test-k8s-ingress-1.toolsbeta.eqiad1.wikimedia.cloud
- toolsbeta-test-k8s-ingress-2.toolsbeta.eqiad1.wikimedia.cloud
- host: toolsbeta-test-k8s-ingress-3.toolsbeta.eqiad1.wikimedia.cloud
  port: 30002
  # jobs_port: 30003 if needed

This is now live on toolsbeta and ready for testing. I'm not particularly happy leaving the ports where k8s and haproxy are listening different (30000 vs 30002), but I can live with it.

• Bstorm closed subtask T282087: Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes as Resolved.May 10 2021, 7:45 PM

In T264221#7075691, @Majavah wrote:

This is now live on toolsbeta and ready for testing. I'm not particularly happy leaving the ports where k8s and haproxy are listening different (30000 vs 30002), but I can live with it.

This is the sort of thing haproxy is great at, so that's all good to me. I was originally thinking of using a port overridden by the instance's host hiera, but you set up a more general solution that might actually be useful for future upgrades and changes. Seems good.

Verified that the ingress admission controller works (at least with the dual stack setup) and https://wikitech.wikimedia.org/wiki/Tool:Iw can works on the new ingress controller too as an example of a tool with non-default ingress objects.

I believe as long as the used pod security policies and RBAC rules are sensible this is ready for the real Toolforge. I don't think this needs a properly announced deployment window with early warnings and completion confirmations (like we do for outages) but I'd send a heads-up cloud-announce@ email (which I will need rights for on mailman) like "hey, we updated the ingress controller, we think nothing was broken in the process, if you see anything weird please let us know".

Change 685715 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Add ingress-nginx Helm files

https://gerrit.wikimedia.org/r/685715

Change 688361 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Allow passing host port for k8s ingress

https://gerrit.wikimedia.org/r/688361

Mentioned in SAL (#wikimedia-cloud) [2021-05-19T10:44:13Z] <Majavah> create tools-k8s-ingress-[4-6] T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-19T11:09:12Z] <Majavah> deploy helm-based nginx ingress controller v0.46.0 to ingress-nginx-gen2 namespace T264221

Install on Toolforge is currently failing on

root@tools-k8s-control-1:~# helm install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values /etc/kubernetes/nginx-ingress-helm-values.yaml
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe): unknown field "port" in io.k8s.api.core.v1.Probe, ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe): unknown field "port" in io.k8s.api.core.v1.Probe]

Change 692875 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Update helm values after chart changes

https://gerrit.wikimedia.org/r/692875

Change 692875 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Update helm values after chart changes

https://gerrit.wikimedia.org/r/692875

Things are constantly failing with errors like:

2021/05/19 12:14:56 [error] 113#113: *2506 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:56 [error] 113#113: *2506 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for tatsumo.pythonanywhere.com:
failed to receive reply from UDP server 10.96.0.10:53: timeout
failed to receive reply from UDP server 10.96.0.10:53: timeout, context: ngx.timer
2021/05/19 12:14:56 [error] 113#113: *2530 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:56 [error] 113#113: *2555 lua udp socket read timed out, context: ngx.timer


2021/05/19 12:14:57 [error] 112#112: *2940 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:57 [error] 112#112: *2964 lua udp socket read timed out, context: ngx.timer
2021/05/19 12:14:57 [alert] 112#112: lua failed to run timer with function defined at @/etc/nginx/lua/balancer.lua:141: 256 lua_max_running_timers are not enough
2021/05/19 12:14:57 [alert] 112#112: lua failed to run timer with function defined at @/etc/nginx/lua/monitor.lua:51: 256 lua_max_running_timers are not enough

I'm trying to find some graphs about kube-dns usage during that time but have not yet found any.

Do the VMs have the required security group? In tools, an extra security group that isn't necessary to think about in toolsbeta. It blocks UDP, specifically.

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T15:15:53Z] <Majavah> move tools-k8s-ingress-[5-6] from "tools-k8s-full-connectivity" to "tools-new-k8s-full-connectivity" security group T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T15:17:46Z] <Majavah> trying to install ingress-nginx via helm again after adjusting security groups T264221

In T264221#7100833, @Bstorm wrote:

Do the VMs have the required security group? In tools, an extra security group that isn't necessary to think about in toolsbeta. It blocks UDP, specifically.

I thought they did, apparently not... thanks.

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T16:31:45Z] <Majavah> pool tools-k8s-worker-4 as an ingress node T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-20T17:05:45Z] <Majavah> pool tools-k8s-ingress-5 as an ingress node, depool ingress-1 T264221

Mentioned in SAL (#wikimedia-cloud) [2021-05-21T06:52:53Z] <Majavah> pool tools-k8s-ingress-6 and depool ingress-[2,3] T264221

Mentioned in SAL (#wikimedia-cloud) [2021-06-04T21:30:13Z] <bstorm> deleting "tools-k8s-ingress-3", "tools-k8s-ingress-2", "tools-k8s-ingress-1" T264221

• Bstorm mentioned this in T284353: Fix prometheus monitoring for Toolforge Ingress.Jun 4 2021, 9:35 PM

• Bstorm closed subtask T284353: Fix prometheus monitoring for Toolforge Ingress as Invalid.Jun 4 2021, 9:49 PM

• Bstorm reopened subtask T284353: Fix prometheus monitoring for Toolforge Ingress as Open.Jun 4 2021, 9:52 PM

Mentioned in SAL (#wikimedia-cloud) [2021-06-07T17:37:39Z] <majavah> remove tools-k8s-ingress-[1-3] from kubernetes, follow-up to https://sal.toolforge.org/log/nd7v2HkB1jz_IcWuCX5M T264221

Mentioned in SAL (#wikimedia-cloud) [2021-06-07T17:42:38Z] <majavah> delete ingress-nginx namespace and related objects T264221

Change 698588 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: Remove non-helm ingress-nginx files

https://gerrit.wikimedia.org/r/698588

Change 698588 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: Remove non-helm ingress-nginx files

https://gerrit.wikimedia.org/r/698588

Submitted a PAWS PR at https://github.com/toolforge/paws/pull/78

• taavi closed subtask T284353: Fix prometheus monitoring for Toolforge Ingress as Resolved.Jun 20 2021, 11:30 AM

Change 702426 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] paws::haproxy: Allow specifying the port for ingress nodes

https://gerrit.wikimedia.org/r/702426

• Bstorm mentioned this in T285886: Request increased quota for paws Cloud VPS project.Jun 30 2021, 5:05 PM

Change 702426 merged by Bstorm:

[operations/puppet@production] paws::haproxy: Allow specifying the port for ingress nodes

https://gerrit.wikimedia.org/r/702426

Mentioned in SAL (#wikimedia-cloud) [2021-06-30T17:16:26Z] <bstorm> temporarily increased quota to 60 cores to enable T264221

Mentioned in SAL (#wikimedia-cloud) [2021-06-30T17:26:10Z] <majavah> creating paws-k8s-ingress-[3-4] and joining them to the k8s cluster T264221

• taavi mentioned this in rPAWSb5914dfcf6b7: Update ingress-nginx and deploy it via Helm charts.Jun 30 2021, 5:44 PM

Mentioned in SAL (#wikimedia-cloud) [2021-07-01T12:04:41Z] <majavah> deploy ingress-nginx 0.46 via the helm chart to paws T264221

• taavi added a parent task: T280302: Upgrade PAWS Kubernetes to the latest 1.18 release.Jul 2 2021, 5:09 AM

• taavi moved this task from Working on to Unsorted on the User-Majavah board.Jul 2 2021, 11:55 AM

• taavi moved this task from Unsorted to Working on on the User-Majavah board.

Mentioned in SAL (#wikimedia-cloud) [2021-07-12T13:05:35Z] <majavah> moving user traffic to updated ingress-nginx T264221

PAWS was updated too and I'm not seeing any breakage. I'll leave the old nodes running for a few days in case we need to rollback.

Change 704277 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge::prometheus: Update PAWS ingress target

https://gerrit.wikimedia.org/r/704277

Mentioned in SAL (#wikimedia-cloud) [2021-07-14T10:38:22Z] <majavah> correction: undeploy old ingress T264221

This is complete.

• Bstorm awarded a token.Jul 14 2021, 4:37 PM

Change 704277 merged by Bstorm: