Page MenuHomePhabricator

🔵 Update nginx-ingress to ingress-nginx chart v4.2.5
Closed, ResolvedPublic

Description

It appears that the version of nginx-ingress we are using (1.34.3) is a) super old and b) is now being pulled from a deprecated chart repository c) is using the beta ingress API that we want to move away from so we can upgrade to kubernetes 1.22

image.png (241×810 px, 27 KB)

The new official location of this appears to be: https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx / https://artifacthub.io/packages/helm/ingress-nginx/ingress-nginx (as per https://github.com/helm/charts/tree/master/stable/nginx-ingress)

As per the documentation at https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/CHANGELOG.md it seems that 4.2.5 is the newest chart (with controller version v1.3.1) that supports Kubernetes 1.21. So we'll need to update to this point first.

Event Timeline

Tarrow renamed this task from Update nginx-ingress to Update nginx-ingress to ingress-nginx chart v4.2.5.Nov 8 2022, 8:54 AM
Tarrow updated the task description. (Show Details)
Tarrow renamed this task from Update nginx-ingress to ingress-nginx chart v4.2.5 to 🔵 Update nginx-ingress to ingress-nginx chart v4.2.5.Nov 8 2022, 3:26 PM

Looks like naively swapping out one for another without making any changes at all doesn't work out of the box. General direction can be seen at: https://github.com/wmde/wbaas-deploy/pull/590

Rosalie and I added a few more commits to this branch. We also tried to deploy it to staging (while dibs-ing so no merged commit to the deploy repo). This appears to have a) not worked (i.e. it broke a number of uptime checks) but b) even after reverting these checks stayed failing. Tomorrow we'll need to go ahead and investigate that

This feel very much like the problem may be on our alerts. Staging seems healthy with all pods running and coffeebase.wikibase.dev is running just fine.
I restarted some deployments in hopes this may resolve alert but it didn't. For some reason the checks keep failing.

It appears that the issue we see with this on staging causing the uptime checks to fail is still there.

The main difference in behaviour between the old nginx-ingress release and this new ingress-ngins release seems to be that the "force ssl" 308 now removes the trailing / from the originally requested path. That is:

> GET /tools/cradle/ HTTP/1.1
> Host: coffeebase.wikibase.dev
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 308 Permanent Redirect
< Date: Fri, 18 Nov 2022 17:35:51 GMT
< Content-Type: text/html
< Content-Length: 164
< Connection: keep-alive
< Location: https://coffeebase.wikibase.dev/tools/cradle
<

Resolution to this appears to be to set:
controller.config.preserve-trailing-slash to true in the chart values.

This was rather hard for me to discover. It doesn't appear to be documented in https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/

I suspected setting this might do something that I wanted by kubectl exec-ing into the controller pod and then running nginx -T and searching through the config for "rewrite". I saw this:

location / {                                                                                                                                                                                                               
                                                                                                                                                                                                                                           
                        set $namespace      "default";                                                                                                                                                                                     
                        set $ingress_name   "platform-apps-ingress-wikibase-dev";                                                                                                                                                          
                        set $service_name   "platform-nginx";                                                                                                                                                                              
                        set $service_port   "8080";                                                                                                                                                                                        
                        set $location_path  "/";                                                                                                                                                                                           
                        set $global_rate_limit_exceeding n;                                                                                                                                                                                
                                                                                                                                                                                                                                           
                        rewrite_by_lua_block {                                                                                                                                                                                             
                                lua_ingress.rewrite({                                                                                                                                                                                      
                                        force_ssl_redirect = true,                                                                                                                                                                         
                                        ssl_redirect = false,                                                                                                                                                                              
                                        force_no_ssl_redirect = false,                                                                                                                                                                     
                                        preserve_trailing_slash = false,                                                                                                                                                                   
                                        use_port_in_redirects = false,                                                                                                                                                                     
                                        global_throttle = { namespace = "", limit = 0, window_size = 0, key = { }, ignored_cidrs = { } },                                                                                                  
                                })                                                                                                                                                                                                         
                                balancer.rewrite()                                                                                                                                                                                         
                                plugins.run()                                                                                                                                                                                              
                        }

and the name sounded intriguingly to do what I wanted

As stated in the PR I'm not super sure how to verify this but it seems to work on my local cluster so that is good :D

Deployed it to staging in order to be able to inspect the cloud logs and adjust our monitoring code

nginx-ingress is now uninstalled on staging and production. This PR cleans up the helmfile, but engineers should apply the changes to their local cluster first (or just keep in mind to apply helmfile with the target marked as installed: false): https://github.com/wmde/wbaas-deploy/pull/599

Evelien_WMDE claimed this task.