Page MenuHomePhabricator

Toolforge interwiki link handling no longer strips URL-encoding before redirecting when it previously did, breaking existing on-wiki links
Open, LowPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Go to any Wikimedia wiki
  • Preview or publish the wikitext [[:toolforge:pageviews/?start=2025-10-07&end=2025-11-05&project=en.wikipedia.org&pages=Example]]

What happens?:

A link is created to https://pageviews.wmcloud.org/%3Fstart%3D2025-10-07%26end%3D2025-11-05%26project%3Den.wikipedia.org%26pages%3DExample, which is a 404 error. This error currently affects at least 71,000 pages on enwiki just with the template I encountered it with; while not all of those use that feature of the template, all Redirects for Discussion subpages do, and do so many times over, likely bringing the total number of broken links into the hundreds of thousands even before considering other pages that might use [[toolforge:]] links with URL parameters directly or through other templates.

What should have happened instead?:

The link should have gone to https://pageviews.wmcloud.org/?start=D2025-10-07&end=2025-11-05&project=en.wikipedia.org&pages=Example, i.e. not escaping the question mark, ampersands, and equal signs. Until recently this was how such links behaved, as evidenced by en:Module:PageLinks having relied on this behavior for many years without issue.

Other information (browser name/version, screenshots, etc.):

First raised on enwiki by Myceteae here.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
bd808 subscribed.

Interwiki links including query string data were never supposed to work: T345783: iw.toolforge.org does not support URL-encoded query parameters ([[toolforge:foo?bar]])

The accidental feature was broken by T408570: IW ingress config does not work with nginx-ingress-controller:v1.13.3 where the prior nginx Ingress redirect processing was replaced with a new haproxy implementation.

bd808 renamed this task from Wikilinks to Toolforge now escape punctuation when they previously did not, breaking hundreds of thousands of links to Toolforge interwikli link handling no longer strips URL-encoding before redirecting when it previously did, breaking existing on-wiki links.Thu, Nov 6, 9:23 PM

The lack of query string support in MediaWiki's internal link syntax is documented twice on https://www.mediawiki.org/wiki/Help:Links.

External links to internal pages
To add a link to a page on the same wiki using URL query parameters, you may need to use external link syntax.

Unlike external links, internal links do not support the use of URL query parameters.

bd808 added subscribers: LucasWerkmeister, Ahecht.

Moving some discussion from T247432: Preserve the ability to make interwiki links to Toolforge tools under the host based routing scheme

With my randomincategory tool (https://randomincategory.toolforge.org/), interwikilinks such as [[toolforge:randomincategory/Pending_AfC_submissions&server=en.wikipedia.org&namespace=2!118&type=page|Random submission]] had been working for several years, but seem to have stopped working because the &s and =s are now getting url encoded. Is that due to the recent patch by @bd808?

See https://en.wikipedia.org/wiki/Template_talk:AfC_status#UH_OH!!!

Yes, it would quite likely be caused by the switch from resolving in the nginx Ingress to the haproxy front proxy. The behavior you are reporting as a regression was actually accidental and likely a side effect of some earlier change in the front proxy or ingress stack. T345783: iw.toolforge.org does not support URL-encoded query parameters ([[toolforge:foo?bar]]) is a feature request that @LucasWerkmeister filed a couple years ago when he found that URL-encoding issue in testing interwiki links with query string data. MediaWiki's internal/interwiki link format is not intended to work with query strings as the URL encoding shows. My past assertion was that having special support for undoing the URL-encoding in the [[toolforge:]] interwiki handler would be an undiscoverable feature for wiki editors.

The interwiki links reported in that on-wiki discussion are even more confusing actually as they are missing a ? to indicate that a query string is being used at all. I believe y'all that the syntax worked previously to provide a query string to your webservice, but I am confused at the moment how that actually happened.

@bd808 That was potentially a bad example, as the php script is doing some custom parsing of the URL so it works with or without a ?, but it's also a case where the script URLDecoding the URL before parsing the parameters wouldn't work because it would break a case such as https://randomincategory.toolforge.org/?category=Texas_A%26M_University

[[toolforge:afdstats/afdstats.py?name=Ahecht]] might be a more typical example.

@bd808 That was potentially a bad example, as the php script is doing some custom parsing of the URL so it works with or without a ?, but it's also a case where the script URLDecoding the URL before parsing the parameters wouldn't work because it would break a case such as https://randomincategory.toolforge.org/?category=Texas_A%26M_University

[[toolforge:afdstats/afdstats.py?name=Ahecht]] might be a more typical example.

The problem of URL-decoding in the wrong time/place is fundamentally why I am reluctant to try and recreate whatever accidental magic was making internal links with query string payloads work with the prior ingress. I can't think of a safe way to decide that decoding should be applied. MediaWiki never intended interwiki links with a query string to work. Hacking support for it into Toolforge specifically will be fragile and difficult to discover. I feel badly that it accidentally worked in the past and people unknowingly took advantage of that accidental feature. I do not however see how it is reasonably fixable on the Toolforge side.

If this has to be a breaking change, I think that's okay, since it's not like a huge number of people were relying on this behavior to write out links by hand; it mostly occurred in templates. For now we've fixed PageLinks on enwiki by just using an external link, and we can do the same with others. I'm wondering if this is a case where the old links could be cleaned up by a maintenance script, though, given the long-term reliance on this behavior? It would be easy enough to get a bot approved to fix these on enwiki, but this affects all wikis, so I wonder if it'd be easier to handle server-side.

Tamzin renamed this task from Toolforge interwikli link handling no longer strips URL-encoding before redirecting when it previously did, breaking existing on-wiki links to Toolforge interwiki link handling no longer strips URL-encoding before redirecting when it previously did, breaking existing on-wiki links.Fri, Nov 7, 7:28 AM

I'm wondering if this is a case where the old links could be cleaned up by a maintenance script, though, given the long-term reliance on this behavior? It would be easy enough to get a bot approved to fix these on enwiki, but this affects all wikis, so I wonder if it'd be easier to handle server-side.

This should be within the limits of technical possibility. It would need to be a newly created maintenance script just for this cleanup I believe. In theory a pipeline like mwscript --wiki wikidb getText "page_title" | awk ... | mwscript --wiki wikidb edit "page_title" could do the inner loop changes that are needed, but in practice I don't think we can automate such things readily with mwscript-k8s on the wikikube Kubernetes cluster. It was once possible to run a completely ad hoc maintenance script across a subset of the Wikimedia wiki farm, but as far as I know today the script needs to be added to mediawiki/core or a WMF deployed extension and deployed before it can be run.

I am not sure if there would be more or less friction in writing, deploying, and running a custom maintenance script vs writing, getting community approval to run, and running say a pywikibot script to do the same cleanup. The potential multi-wiki use case might make the maintenance script lower effort across the whole wiki farm. On the other hand I'm not sure how many communities have the level of sophistication that enwiki has in attempting to keep bot edits contained to proven scripts by trusted operators. Do we need more information on the extent of the problem before we can work out what a reasonable next step is?

taavi triaged this task as High priority.Wed, Nov 12, 3:05 PM
bd808 lowered the priority of this task from High to Low.Thu, Nov 20, 6:48 PM

I am changing priority here from high -> low. The problem exists, but today I believe the fix is for people to fix links on-wiki. That does not feel like a readily actionable task for Toolforge admins or WMCS staff, but maybe I am missing a good angle on that?

It sounds like a number of cases have been fixed by changing template implementations, but that there will be a long tail of static links that could be cleaned up.