Page MenuHomePhabricator

IW ingress config does not work with nginx-ingress-controller:v1.13.3
Closed, ResolvedPublicBUG REPORT

Description

[14:25]  <    bd808> random: has anything changed with the Toolforge Kubernetes ingress in the last 24 hours or so? The iw tool (redirector for 'toolforge:" interwiki links) is not working all of a sudden. Its an ingress-only tool that configures the nginx ingress to redirect to the proper *.toolforge.org hostname.
[14:25]  <    bd808> https://iw.toolforge.org/bd808-test is my quick test that is failing. It should redirect to https://bd808-test.toolforge.org/
[14:26]  <    taavi> bd808: hmm, I upgraded ingress-nginx earlier today
[14:27]  <    bd808> taavi: that sounds related
[14:27]  <    taavi> agreed. is there a version of tool-iw in toolsbeta or is it deployed in tools only?>
[14:28]  <    bd808> I don't think I ever put one in toolsbeta
[14:28]  <    taavi> hmh. the ingress-nginx logs in tools are too noisy to be of any use for these kinds of things
[14:28]  <    bd808> https://wikitech.wikimedia.org/wiki/Tool:Iw has the ingress config on it.
[14:30]  <    taavi> I managed to get a not-very-helpful validation error out of it: https://phabricator.wikimedia.org/P84313
[14:31]  <    bd808> ok, its not liking `kubernetes.io/ingress.class: nginx` in the annotations
[14:31]  <    taavi> that too, but that doesn't seem to be the main issue?
[14:31]  <    taavi> > W1028 14:30:15.324067       7 validators.go:243] validation error on ingress tool-iw/iw-domain: annotation temporal-redirect contains invalid value https://$1.toolforge.org/$3$is_args$args

Upstream :

Event Timeline

bd808 triaged this task as High priority.

Taavi is rolling back the update so I can have a bit more time to figure out what needs changing.

Playing around in toolsbeta to try and figure out what is going on with the newer ingress version.

bd808@toolsbeta-bastion-7:~$ kubectl sudo logs -n ingress-nginx-gen2 ingress-nginx-gen2-controller-78d4687789-65dzs | grep iw
I1028 16:55:35.680894       7 store.go:443] "Found valid IngressClass" ingress="tool-iw/iw-domain" ingressclass="nginx"
W1028 16:55:35.681985       7 validators.go:243] validation error on ingress tool-iw/iw-domain: annotation temporal-redirect contains invalid value https://$1.beta.toolforge.org/$3$is_args$args
I1028 16:55:35.682739       7 event.go:377] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"tool-iw", Name:"iw-domain", UID:"72607a10-ea44-4772-addc-d452cf0eab95", APIVersion:"networking.k8s.io/v1", ResourceVersion:"918683913", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I1028 16:55:35.710421       7 store.go:443] "Found valid IngressClass" ingress="tool-iw/iw-root" ingressclass="nginx"
W1028 16:55:35.710980       7 validators.go:243] validation error on ingress tool-iw/iw-root: annotation temporal-redirect contains invalid value https://beta.toolforge.org/$is_args$args
I1028 16:55:35.711124       7 event.go:377] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"tool-iw", Name:"iw-root", UID:"2b2fd0e8-4cc0-4339-9919-5bed4a806a8b", APIVersion:"networking.k8s.io/v1", ResourceVersion:"918683917", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I1028 17:03:45.808397       7 store.go:492] "removing ingress because of unknown ingressclass" ingress="tool-iw/iw-domain"
I1028 17:03:45.875905       7 store.go:492] "removing ingress because of unknown ingressclass" ingress="tool-iw/iw-root"

It doesn't like either ingress's redirect replacement rules.

Is this the https://github.com/kubernetes/ingress-nginx/issues/10698 upstream bug? It sure looks like it. At some point the upstream annotation validation changed to only allow URL safe characters which in turn breaks usage of any extended nginx syntax... how could this not break things for like everyone using the ingress?

bd808 renamed this task from Config does not work with nginx-ingress-controller:v1.13.3 to IW ingress config does not work with nginx-ingress-controller:v1.13.3.Oct 28 2025, 6:22 PM

The upstream annotation validation code was first introduced in c5f348e. That commit is included in the working v1.11.5 deployment as well as the broken v1.13.3 deployment. There is an enable-annotation-validation feature flag that can be set to false disable the annotation validation behavior. That flag defaulted to false until 7b4e4e2 which was first included in the v1.12.0 deployment.

So the current state of things is that the default enabled annotation validation disallows a number of valid nginx configuration values and the upstream has not yet been convinced to treat this as a regression. We could work around the problem by disabling all of the (new to us) annotation validation code, but I have a hunch this would be a poor idea. We could work upstream to get this recognized as bug and fixed in a newer version. I have added some information to the upstream bug report, but I expect it will take a while for a fix to be implemented and released even if folks decide to jump on the problem.

My next thought is to move the functionality somewhere else. That could look like a relatively small tool that parses the path and emits the expected redirect. It should also be possible to push things up a layer in the Toolforge stack and do the redirect handling in the haproxy layer in front of the ingress. @taavi and @dcaro: does one of these options (custom webapp vs haproxy config) sounds better or worse to you?

Agreed that we should not disable the upstream validation entirely.

Out of those two options, I have a slight preference for moving it to HAProxy, just for the sake of having one separate thing less that we need to upgrade and monitor. But the other option is totally fine with me as well.

Both options look good to me yep, the difference I see is with a webapp it can stay as an independent tool, so it can be managed by volunteers, otherwise it will become managed only by toolforge roots.
I think it's a useful feature so I'm ok with it being managed by toolforge roots, so up to you @bd808, happy either way :)

Mentioned in SAL (#wikimedia-cloud) [2025-10-31T19:53:41Z] <bd808> Disabled Puppet on toolsbeta-test-k8s-haproxy-7 for manual testing of haproxy changes (T408570)

Mentioned in SAL (#wikimedia-cloud) [2025-11-01T00:06:10Z] <bd808> Re-enabled Puppet on toolsbeta-test-k8s-haproxy-7 (T408570)

Change #1201847 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/puppet@production] toolforge: Handle interwiki redirects in front proxy

https://gerrit.wikimedia.org/r/1201847

Change #1201847 merged by Majavah:

[operations/puppet@production] toolforge: Handle interwiki redirects in front proxy

https://gerrit.wikimedia.org/r/1201847

@taavi should we make another task about all the other things that nginx-ingress-controller:v1.13.3 is going to break?