Investigate query parameter normalization for MW/services
Closed, ResolvedPublic
Actions

Description

In parsing a few short log samples of all received URLs, I've noticed that we definitely do have inconsistencies in query parameter sorting. This in turn leads to cache duplication, which wastes space, wastes cache hit opportunities, and makes any related purging harder (in the case that they're even purgeable or purged).

Most of the examples I notice have to do with the relative ordering of the parameters action, ctype, feed, feed, format, printable. Some are mobileview queries, too. We could endeavor to "fix" this on the client side (in client app code, and in the parameterized URLs we emit in our own content output). We could also normalize it on reception at the cache layer by sorting all parameters, e.g. with https://github.com/vimeo/libvmod-boltsort . The biggest question-mark is whether we think all of our code is insensitive to query param order or not. It probably should be, but that's not necessarily true.

Relatedly, we could also normalize and/or fixup (again, in our output or clients) the unnecessary use of defaulted parameters. The most-obvious example is debug=false for load.php, which is commonly present and also probably the default, but redundant with the same query that might come in without the debug flag.

Details

Subject	Repo	Branch	Lines +/-
Add "querysort" option	performance/WikimediaDebug	master	+10 -0
varnish::tests: add tests for query-sorting	operations/puppet	production	+53 -0
Support CDN query parameter re-ordering	mediawiki/core	wmf/1.39.0-wmf.23	+126 -7
C:varnish: fix varnish confd test data	operations/puppet	production	+13 -14
Support CDN query parameter re-ordering	mediawiki/core	master	+126 -7
varnish: enable query-sorting in production via X-Wikimedia-Debug	operations/puppet	production	+18 -10
Add operations/software/varnish/libvmod-querysort debian-glue CI job	integration/config	master	+3 -0
varnish: use libvmod-querysort on Beta Cluster	operations/puppet	production	+8 -3
Initial Debian packaging	operations/software/varnish/libvmod-querysort	main	+74 -0
UDF for testing uri_query for duplicate query parameters	analytics/refinery/source	master	+111 -0
varnish: sort query parameters on the Beta Cluster	operations/puppet	production	+5 -0

Related Objects
Search...

Status	Assigned	Task
Resolved	ori	T138093 Investigate query parameter normalization for MW/services
Open	None	T310087 Advance declaration of query parameters
Resolved	ori	T314868 Roll out query parameter normalization

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

• GWicke mentioned this in T66214: Define an official thumb API.Nov 8 2016, 10:04 PM

• Gilles subscribed.Nov 9 2016, 2:45 PM

• GWicke mentioned this in T150673: Thumb API: Varnish / CDN questions.Nov 14 2016, 6:28 PM

• GWicke moved this task from Backlog to watching on the Services board.Jul 12 2017, 5:19 PM

• GWicke edited projects, added Services (watching); removed Services.

• mobrovac added a project: Platform Team Legacy (Watching / External).Dec 20 2018, 12:04 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:43 PM

Tgr mentioned this in T224365: REST API URL canonization.May 26 2019, 1:19 PM

• Jhernandez unsubscribed.Apr 2 2020, 6:46 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:02 PM

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

BBlack moved this task from Backlog to Roadmap on the Traffic-Icebox board.Apr 7 2022, 9:13 PM

ori mentioned this in T310087: Advance declaration of query parameters.Jun 7 2022, 4:07 PM

Zabe subscribed.Jun 7 2022, 4:09 PM

Re-ordering duplicate query parameters could be problematic. If a parameter appears multiple times, its value in $_GET will be set based on the lattermost occurrence. (I don't believe this is specified anywhere, but that is the behavior in practice.)

This means that ?action=edit&action=history will go to the history page and ?action=history&action=edit will go to the edit page.

Changing the order could break old URLs, and any PHP or JavaScript code that expects this behavior. I don't expect we'd see this in code that has undergone code review, but it could be an issue for user-authored JavaScript.

Change 806488 had a related patch set uploaded (by Ori; author: Ori):

[operations/puppet@production] varnish: sort query parameters on the Beta Cluster

https://gerrit.wikimedia.org/r/806488

gerritbot added a project: Patch-For-Review.Jun 21 2022, 1:25 PM

Change 806488 merged by Ori:

[operations/puppet@production] varnish: sort query parameters on the Beta Cluster

https://gerrit.wikimedia.org/r/806488

Maintenance_bot removed a project: Patch-For-Review.Jun 21 2022, 2:31 PM

Change 807340 had a related patch set uploaded (by Ori; author: Ori):

[analytics/refinery/source@master] UDF for testing uri_query for duplicate params

https://gerrit.wikimedia.org/r/807340

gerritbot added a project: Patch-For-Review.Jun 22 2022, 11:12 PM

ori added a subtask: T310087: Advance declaration of query parameters.Jun 22 2022, 11:22 PM

The UDF in I3ff40d5b2 can be used to identify web requests with query strings that contain duplicate conflicting parameters (same key, different values).

I used it to identify some examples in the webrequest logs.

One common case is requests to the /beacon/statsv endpoint which contain multiple measurements for some metric, e.g.:

/beacon/statsv?PagePreviewsApiResponse=32ms&PagePreviewsApiResponse=31ms

These are two distinct measurements, so the URI is relatively sane, and the statsv backend handles it correctly. Sorting doesn't alter the meaning here because all measurements on a single statsv request are treated as having the same time. However care must be taken not to treat either key-value pair as a duplicate and remove it, since doing so would alter the semantics of the request. OTOH if we decide we don't like these URIs we could make statsv combine multiple measurements for the same metric key into a comma-separated list.

More problematically, there are cases like the ones I hypothesized above, where a request containing duplicate keys with conflicting values is made to a PHP endpoint. For example, the wwwportal search form seems to generate URLs like:

https://www.wikipedia.org/search-redirect.php?family=Wikipedia&language=en&search=ipad&language=de&go=Go

This currently redirects to the German article because language=de comes after language=en. Sorting parameters in ascending lexicographic order would put language=en last so it would redirect to the English article instead. I see requests like this to other PHP endpoints as well, so it's not an isolated problem.

In order to avoid breaking code that constructs such URLs, if we're going to sort query parameters, it is necessary to use a stable sort, and sort by parameter name only.

jbond subscribed.Jun 23 2022, 3:44 PM

I took a closer look at varnish's std.querysort() and I think it's a relatively straightforward change to get it to sort the way we want (stable sort by key name). Since the query parameters are pointers into the URL buffer, we can compare by address when the keys are equal to get a stable sort.

Check my logic — here's the diff: https://github.com/atdt/varnish-cache/commit/694f5a64c6988be1d1468d36adeaf3446d9365fa

It'd be straightforward to carve this out into separate vmod.

In T138093#8024884, @ori wrote:

It'd be straightforward to carve this out into separate vmod.

Done: https://github.com/atdt/libvmod-querysort.

I wanted to add a note regarding the PHP tricks of passing arrays in query parameters that were mentioned in T138093#2586982. When they are used, the order of non-duplicate parameters matters too, because you can mix literal and sequential array keys – ?foo[0]=a&foo[]=b is not the same as ?foo[]=b&foo[0]=a. Hopefully we're not doing such nonsense in our code (using either style is reasonable IMO, but not both in one request), but it is possible for on-wiki code to make use of that. A real parameter that could be used like this is preloadparams.

The array parameter notation is straightforward to deal with as long as it uses a literal '[', since we just treat it as a stop character like '='. I added support for it, and a couple of test cases: https://github.com/atdt/libvmod-querysort/commit/8a68f5f061

The case where the opening bracket is percent-encoded is more annoying since it then spans multiple characters ("%5b") and thus requires looking ahead. Let's just skip normalization for those cases like bblack suggested above.

Change 807340 merged by jenkins-bot:

[analytics/refinery/source@master] UDF for testing uri_query for duplicate query parameters

https://gerrit.wikimedia.org/r/807340

Maintenance_bot removed a project: Patch-For-Review.Jun 27 2022, 7:31 AM

vmod migrated to Gerrit: https://gerrit.wikimedia.org/g/operations/software/varnish/libvmod-querysort
Next step, packaging.

Change 810551 had a related patch set uploaded (by Ori; author: Ori):

[operations/software/varnish/libvmod-querysort@main] Initial Debian packaging.

https://gerrit.wikimedia.org/r/810551

gerritbot added a project: Patch-For-Review.Jul 3 2022, 11:42 PM

Change 810551 merged by Legoktm:

[operations/software/varnish/libvmod-querysort@main] Initial Debian packaging

https://gerrit.wikimedia.org/r/810551

ori mentioned this in rOSQRb9bbc6338483: Initial Debian packaging.Jul 8 2022, 6:00 PM

Maintenance_bot removed a project: Patch-For-Review.Jul 8 2022, 6:31 PM

Change 812450 had a related patch set uploaded (by Ori; author: Ori):

[operations/puppet@production] varnish: use libvmod-querysort on Beta Cluster

https://gerrit.wikimedia.org/r/812450

gerritbot added a project: Patch-For-Review.Jul 9 2022, 3:12 PM

Change 812568 had a related patch set uploaded (by Ori; author: Ori):

[integration/config@master] Add operations/software/varnish/libvmod-querysort debian-glue CI job

https://gerrit.wikimedia.org/r/812568

Change 812568 merged by jenkins-bot:

[integration/config@master] Add operations/software/varnish/libvmod-querysort debian-glue CI job

https://gerrit.wikimedia.org/r/812568

Change 812450 merged by CDanis:

[operations/puppet@production] varnish: use libvmod-querysort on Beta Cluster

https://gerrit.wikimedia.org/r/812450

Just wanted to chime in with something that may be of interest - we have been doing some URL normalization (namely, rewriting /wiki URLs to /index.php ones) on the CDN for the better part of a decade now, but we "denormalize" the URL to be sent to the backend by stashing the pre-normalization URL into a temporary header in vcl_recv, then resetting bereq.url in vcl_backend_fetch to that header.

OK, current status:

libvmod-querysort is in Gerrit;
and packaged for Debian, in buster-wikimedia;
and deployed/enabled on the Beta Cluster text varnish

I also set up a separate test instance on Labs with a minimal Varnish / Nginx / PHP setup, with the following characteristics:

Requests to /unsorted?... are passed through unmodified.
Requests to /sorted?... get query-sorted.
All requests get proxied to a PHP script that prints out $_GET, so you can see how PHP is decoding the query string.

Example:

$ curl -g "https://querysort.wmcloud.org/unsorted?b=3&a=2&a=1&foo[0]=a&foo[]=b"
Request URL: "/unsorted?b=3&a=2&a=1&foo[0]=a&foo[]=b"

Original URL, as seen by Varnish: "/unsorted?b=3&a=2&a=1&foo[0]=a&foo[]=b"

$_GET = array (
  'b' => '3',
  'a' => '1',
  'foo' =>
  array (
    0 => 'a',
    1 => 'b',
  ),
)

$ curl -g "https://querysort.wmcloud.org/sorted?b=3&a=2&a=1&foo[0]=a&foo[]=b"
Request URL: "/sorted?a=2&a=1&b=3&foo[0]=a&foo[]=b"

Original URL, as seen by Varnish: "/sorted?b=3&a=2&a=1&foo[0]=a&foo[]=b"

$_GET = array (
  'a' => '1',
  'b' => '3',
  'foo' =>
  array (
    0 => 'a',
    1 => 'b',
  ),
)

Feel free to experiment with it.

@BBlack , do you have any thoughts on what a gradual roll-out to production might look like? We could enable this in VCL based on the presence of a X-Wikimedia-Debug flag, to allow for some testing in production.

Notes based on IRC discussion on #wikimedia-traffic:

We only want to apply query sorting to text requests for now, because we can't assert with confidence that misc. services are insensitive to parameter order, or (if they're implemented in a language other than PHP) that they handle duplicate parameters in the same way.
- On that subject: we need to validate that query-sorting is safe for CXServer (or else exclude it, too).
The current Beta Cluster VCL patch applies query sorting in normalize_request, which happens before cluster_fe_vcl_switch, and thus is not restricted to text. So we might just need an extra sub right after the switch, say normalize_request_nonmisc or something, to park this in.
Initial testing in production via X-W-D sounds like a good idea.
To analyze the potential impact, we need a good way to count variants over traffic logs.

Change 816206 had a related patch set uploaded (by Ori; author: Ori):

[operations/puppet@production] varnish: enable query-sorting in production via X-Wikimedia-Debug

https://gerrit.wikimedia.org/r/816206

Change 816218 had a related patch set uploaded (by Ori; author: Ori):

[performance/WikimediaDebug@master] Add query-sorting flag

https://gerrit.wikimedia.org/r/816218

In T138093#8092400, @ori wrote:

On that subject: we need to validate that query-sorting is safe for CXServer (or else exclude it, too).

It appears Node.js's handling of duplicate query parameters is different from PHP's, and also internally inconsistent.

const url_with_dupes = 'https://foo/?a=1&a=2';

const url = require('node:url');
console.log(url.parse(url_with_dupes, true).query.a);  // output: [1, 2]

const querystring = require('node:querystring');
console.log(querystring.parse(url_with_dupes.split('?')[1]));  // output: [1, 2]

console.log(new URL(url_with_dupes).searchParams.get('a'))  // output: 1

However, since the querysort vmod does not re-order duplicate parameters with respect to one another, it is safe.

Tgr mentioned this in T269492: Selecting user language in the REST API.Jul 28 2022, 5:18 AM

Change 816206 merged by Vgutierrez:

[operations/puppet@production] varnish: enable query-sorting in production via X-Wikimedia-Debug

https://gerrit.wikimedia.org/r/816206

@ori I've just deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/816206 and naive tested it against cp4027:

vgutierrez@cp4027:~$ sudo -i varnishlog -n frontend -i ReqURL -q "ReqURL ~ vgutierrez"
# curl "https://en.wikipedia.org/favicon.ico?vgutierrez=1&c=1&b=0&a=0" -o /dev/null
*   << Request  >> 862934902 
-   ReqURL         /favicon.ico?vgutierrez=1&c=1&b=0&a=0
# curl -H 'X-Wikimedia-Debug: querysort' "https://en.wikipedia.org/favicon.ico?vgutierrez=1&c=1&b=0&a=0" -o /dev/null
*   << Request  >> 902729953 
-   ReqURL         /favicon.ico?vgutierrez=1&c=1&b=0&a=0
-   ReqURL         /favicon.ico?a=0&b=0&c=1&vgutierrez=1

Change 816218 merged by jenkins-bot:

[performance/WikimediaDebug@master] Add "querysort" option

https://gerrit.wikimedia.org/r/816218

Rolling this out to the high-traffic wikis will be a little bit tricky. When we turn it on, we can expect the cache hit rate to go down initially, as every index.php?title=x&action=y ReqURL gets rewritten to index.php?action=y&title=x prior to cache lookup.

Based on a random sample of request URLs, about 30%-40% contain query parameters in unsorted order, and would be re-written.

One way to approach this is to stagger the roll-out using a hash. Basically:

QUERYSORT_ROLLOUT_PERCENT = 10  /* 0 to 100 */

querysorted_url = querysort.querysort(req.url);
if (crc32(querysorted_url) % 100 < QUERYSORT_ROLLOUT_PERCENT) {
  set req.url = querysorted_url;
}

(The hash is computed over the normalized ReqURL to prevent the proliferation of duplicate objects in the cache. If we were to instead include requests based on a random value, the cache key for a given ReqURL would bounce between the normal and non-normal forms.)

The issue with hashing approach is that VCL does not supply a hash function that returns a value to VCL, AFAICT. Also math operators like '%' are not supported in Varnish 6. So this would require an additional vmod or some inline C.

An alternative approach for staggering the roll-out that could be achieved in plain VCL is to match the last character of the request URL against an increasingly broader character range. The issue with this approach is that the distribution of letters is not uniform, so the steps would vary in size.

In T138093#8117992, @ori wrote:

[...]

Thanks again for working on this! Sounds like a good plan overall to me!

The issue with hashing approach is that VCL does not supply a hash function that returns a value to VCL, AFAICT. Also math operators like '%' are not supported in Varnish 6. So this would require an additional vmod or some inline C.

Actually, the deployed Varnish we have now does have such a function, it's just not in an obvious place! The directors vmod that deals with selection in pools of backend services is already loaded, and it has an independent, stateless helper function INT xshard.key(STRING) for calculating director hash keys. The function takes any string as input, runs SHA256 on the string, and then returns the bottom 32 bits of the hash output for VCL use: https://varnish-cache.org/docs/trunk/reference/vmod_directors.html#xshard-key . In C type conversion terms: the internal hash function returns those bottom 32 bits as uint32_t, and then it's cast to VCL_INT for VCL use, which is int64_t, so the whole unsigned 32-bit range should be usable and you can do basic math on the result (add, sub, mul, div, comparison tests, etc).

So, something like this?

QUERYSORT_ROLLOUT_PERCENT = 10  /* 0 to 100 */

querysorted_url = querysort.querysort(req.url);
if (xshard.key(querysorted_url) < 4294967296 * QUERYSORT_ROLLOUT_PERCENT / 100) {
  set req.url = querysorted_url;
}

Change 818134 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] C:varnish: fix varnish confd test data

https://gerrit.wikimedia.org/r/818134

In T138093#8120299, @BBlack wrote:

So, something like this?
[...]

Yep, this works!

I think it would be useful to have query normalization enabled for all requests for a small wiki before we proceed with the incremental rollout. I'd like to start by enabling query normalization for all testwiki requests, wait a day, then enable it on mediawiki.org for all requests. Then wait a week, and if all looks good, proceed with the incremental roll-out.

For the incremental roll-out, we can start with 1% and ramp up slowly from there, keeping an eye on the overall cache hit rate.

SG?

Change 819677 had a related patch set uploaded (by Ori; author: Ori):

[operations/puppet@production] Enable query sorting for all testwiki requests

https://gerrit.wikimedia.org/r/819677

In T138093#8117992, @ori wrote:

Rolling this out to the high-traffic wikis will be a little bit tricky. When we turn it on, we can expect the cache hit rate to go down initially, as every index.php?title=x&action=y ReqURL gets rewritten to index.php?action=y&title=x prior to cache lookup. […]

Quick drive-by note here, feel free to ignore if a false alarm. MediaWiki has a concept of canonical parameter order in various places, in particular when it comes to action=history and action=raw. The order is consistently produced (e.g. by interface links from Title, mw.util, and importScript) a certain way, and similarly expected that way when MW decides whether to permit the full CDN maxage. There's recent changes to this area in T309063.

I believe if appservers are able to see the changed parameter order, then such non-canonical query parameter would negatively affect cachability both short- and long-term.

In T138093#8012474, @ori wrote:

Re-ordering duplicate query parameters could be problematic. […] This means that ?action=edit&action=history will go to the history page and ?action=history&action=edit will go to the edit page.

It's not clear to me whether this will or will not be the effect of the currently proposed Varnish mod. Could we keep a list somewhere, maybe in this task's description as start of likely concerns and edge cases and what the mod does/doesn't do? For the above, I imagine the neutral approach would be to order keys only but leave value order unchanged. If it's an upstream mod that we'd prefer not to change much and does differently then perhaps not worth changing but otherwise, I'd say this would likely be pretty far down the list of where it's no longer about easy problems for big returns but somewhere where it might not be "worth" the hassle to try to order.

In T138093#8127054, @Krinkle wrote:

Quick drive-by note here, feel free to ignore if a false alarm. MediaWiki has a concept of canonical parameter order in various places, in particular when it comes to action=history and action=raw. The order is consistently produced (e.g. by interface links from Title, mw.util, and importScript) a certain way, and similarly expected that way when MW decides whether to permit the full CDN maxage. There's recent changes to this area in T309063.

The query parameter normalization only happens on the way in, not the way out -- in other words, it does not affect the URLs that MediaWiki constructs and sends to the client (as links, redirects, etc.). However, the literal matching of request URL in MediaWiki::performAction is an issue.

In T138093#8127064, @Krinkle wrote:

It's not clear to me whether this will or will not be the effect of the currently proposed Varnish mod. [...]

It is not. I considered it unacceptable breakage, and this was the motivation to fork Varnish's std.querysort. The implementation in the vmod retains duplicate parameters and maintains their relative order.

Ok, a bit more context about the maxage issue @Krinkle pointed out, plus possible solutions:

The CDN expiry code in MediaWiki is sensitive to parameter ordering. Specifically, MediaWiki allows the CDN maxage only if the request URL is an exact match against one of the URL forms that we PURGE on article update. The intent is to ensure that we don't cache something with no way to purge it. The canonical forms for a standard article are:

The latter two forms are a problem, because they place title= before action=, and the order of these parameters is flipped when query parameters are sorted, which in turn affects expiry. Observe:

$ curl -s -D - -o/dev/null --connect-to ::mw1452.eqiad.wmnet 'https://en.wikipedia.org/w/index.php?title=Science&action=history' | grep -i cache-control
Cache-Control: s-maxage=1209600, must-revalidate, max-age=0

$ curl -s -D - -o/dev/null --connect-to ::mw1452.eqiad.wmnet 'https://en.wikipedia.org/w/index.php?action=history&title=Science' | grep -i cache-control
Cache-Control: private, must-revalidate, max-age=0

A solution to this has to ensure content is both purgeable and cacheable, not just on an ongoing basis, but also during the roll-out period when some incoming requests are normalized and some are not.

Option 1: Change the form of the canonical purge URLs so that query parameters appear in sorted order

In other words, treat ?action=history&title=X as canonical, and ?title=X&action=history non-canonical. While this may be a desirable end-state, it doesn't work well for an incremental roll-out, during which some incoming URLs are normalized and some are not, and existing entries in the cache are a mix of the normal and non-normal forms.

Option 2: Include both the sorted and unsorted forms in the canonical purge URL list

This would make MediaWiki recognize both '?title=X&action=history' and '?action=history&title=X' as canonical, and it would issue purges for both forms. The disadvantage of this approach is that it results in duplicate purges in Varnish, because (thanks to query normalization) both forms refer to the same object in the cache. For a typical article update, this would increase the number of purges sent to Varnish from 4 to 6. This is my preferred option. (Edit: no it isn't; I now prefer option #4 below.)

MediaWiki has a HtmlCacheUpdaterAppendUrls hook that can we can hook into from a Wikimedia-specific extension to avoid making this change for all users of MediaWiki.

Option 3: Make the CDN maxage check insensitive to parameter order

With this approach, there would be a MediaWiki configuration option that would make MediaWiki's cache-control maxage URL check insensitive to parameter order. In other words, instead of testing whether the request URL is byte-equal to one of the purge URLs, MediaWiki would ignore the parameter order. With this approach, no duplicate purges are sent to Varnish. (The order of parameters in the PURGE request is irrelevant, since Varnish would normalize the URL anyway.)

I think the best approach might be option #2 (allow both forms to be cached and purged) for the duration of the incremental roll-out, followed by option #3 as the steady-state once the roll-out is complete. But very open to feedback / alternatives.

I suppose there's also a fourth option: special-case the parameter 'title' so that it always sorted into the first position. This would make the normal form match the canonical form in MediaWiki.

I think your "2 followed by 3" approach makes the most pragmatic sense.

We could/should probably eventually revisit the idea of alternate cache keys (Varnish XKey, and ATS has some similar-ish mechanisms) and get rid of the multi-purge stuff, but that's a whole other complex thing to consider.

In T138093#8129915, @ori wrote:

[…]

Option 2: Include both the sorted and unsorted forms in the canonical purge URL list

This would make MediaWiki recognize both '?title=X&action=history' and '?action=history&title=X' as canonical, […] MediaWiki has a HtmlCacheUpdaterAppendUrls hook that can we can hook into from a Wikimedia-specific extension to avoid making this change for all users of MediaWiki.

Option 3: Make the CDN maxage check insensitive to parameter order

With this approach, there would be a MediaWiki configuration option that would make MediaWiki's cache-control maxage URL check insensitive to parameter order. […] With this approach, no duplicate purges are sent to Varnish. (The order of parameters in the PURGE request is irrelevant, since Varnish would normalize the URL anyway.)

I think the best approach might be option #2 (allow both forms to be cached and purged) for the duration of the incremental roll-out, followed by option #3 as the steady-state once the roll-out is complete. But very open to feedback / alternatives.

LGTM, but I have one question about option 3. I have high confidence in a Varnish mod being able to re-order query parameters without any decoding or normalization, e.g. keep whatever opinionated or redundant percent-encoding as-is in the query string, and similar for the URL path. E.g. we want to avoid a situation where MediaWiki, with this new option enabled, will start to treat as CDN-purgable a URL that is in fact not purgable because it appears identical (once both decoded /and/ compared regardless of order) when actually its encoding is different, whether through differnet ways to redundantly/differently encode the same logical value, or other parts of the URL such as path /?x, /w/?x, /wiki/?.

The risk of such false positive being that it can regress T38142: Suppressed edit summary remains cached in revision view while not logged in.

A few other related issues:

T67402: URLs for the same title without extra query parameters should have the same canonical link
T100782: http://en.wikipedia.org//wiki/Main_Page raise "Redirect loop detected!" because of the double slash
T106793: Pages with single quote in title are inaccessible by some clients (redirect loop)
And my (reverted) rMW155d555b83ec: MediaWiki.php: Redirect non-standard title urls to canonical, which we reverted because we found there was not one universal way that we could enforce an encoding that no browser would silently encode/rewrite differently and thus induce a redirect loop.
GlobalFunctions wfUrlencode for the "title" parameter.
The VCL version of this to improve cache use, https://gerrit.wikimedia.org/g/operations/puppet/+/0b7449aa9c9a246e878420b47ca351468111f0dd/modules/varnish/templates/text-frontend.inc.vcl.erb#22

Changing the ordering (perhaps coupled with varnish redirecting all '?title=X&action=history' to the new '?action=history&title=X' to avoid old forms) would cause an stampede in that it's like no history page is cached. I would need planification, but (partially rolled-out per wikis) I think that could be acceptable. history page requests are much lower than to articles, and they are cheap to produce at MediaWiki layer

However, if varnish would be able to normalize the urls at option 3, I don't think we need that intermediate step, it would just normalize the cache key and act on the right one both on GET and PURGE. If the key becomes different, it should be a staged rollout, so a portion of would store in the new key and a portion on the old one, slowly increasing them. As the purges would apply to both old and new, it would be working properly

I can see how some user scripts or css may be relying on the order of the parameters, but if any backend fails to work properly when using a different order of (non-duplicated) query parameters, I consider that to be a bug.

Nonetheless, the proposal of ori of making the normalised form equal to the one that has been used by mediawiki from immemorial times, seems the best solution of all.

In T138093#8130018, @Platonides wrote:

Changing the ordering (perhaps coupled with varnish redirecting all '?title=X&action=history' to the new '?action=history&title=X' to avoid old forms) would cause an stampede in that it's like no history page is cached. I would need planification, but (partially rolled-out per wikis) I think that could be acceptable.

Yes. This is anticipated and planned for in T138093#8117992 above.

However, if varnish would be able to normalize the urls at option 3, I don't think we need that intermediate step, it would just normalize the cache key and act on the right one both on GET and PURGE. If the key becomes different, it should be a staged rollout, so a portion of would store in the new key and a portion on the old one, slowly increasing them. As the purges would apply to both old and new, it would be working properly

I think we need the intermediate step in case we need to roll back. We don't want to end up with content that is cached under the normal form but is not purged.

Nonetheless, the proposal of ori of making the normalised form equal to the one that has been used by mediawiki from immemorial times, seems the best solution of all.

I like it too. It is easier to get out the door, because it doesn't necessitate additional purges, and won't cause a temporary increase in cache misses.

But more importantly, I think the canonical form (which places the title first) is canonical for a reason — it's the most readable. It will be nice to have URLs be in that form in as many parts of the stack as possible, for debugging, and consistency.

I prefer option 3, implemented in core, configurable and off by default, with a helper method or service that optionally does order-independent matching. MediaWiki::performAction() currently has:

				if (
					in_array(
						// Use PROTO_INTERNAL because that's what HtmlCacheUpdater::getUrls() uses
						wfExpandUrl( $request->getRequestURL(), PROTO_INTERNAL ),
						$htmlCacheUpdater->getUrls( $requestTitle )
					)
				) {

I would change that to

				if ( $request->matchUrlForCdn( $htmlCacheUpdater->getUrls( $requestTitle ) ) ) {

where WebRequest::matchUrlForCdn() checks whether the request URL matches any of the members of the provided array, optionally after sorting the query parameters.

In T138093#8129918, @ori wrote:

I suppose there's also a fourth option: special-case the parameter 'title' so that it always sorted into the first position. This would make the normal form match the canonical form in MediaWiki.

I think that would make it difficult to implement CDN-aware code in other entry points, such as the action API.

I would lean towards not bothering with option 2 as an intermediate step, because non-canonical URLs are rare and caching them is relatively harmless. Just implement order-independent matching, enable it in production, and then roll out the varnish change everywhere the same week.

I did a cursory review of vmod_querysort.c.

Change 820905 had a related patch set uploaded (by Ori; author: Ori):

[mediawiki/core@master] Support CDN query parameter re-ordering

https://gerrit.wikimedia.org/r/820905

Change 820905 merged by jenkins-bot:

[mediawiki/core@master] Support CDN query parameter re-ordering

https://gerrit.wikimedia.org/r/820905

ReleaseTaggerBot added a project: MW-1.39-notes (1.39.0-wmf.25; 2022-08-15).Aug 9 2022, 4:00 AM

Change 818134 merged by Jbond:

[operations/puppet@production] C:varnish: fix varnish confd test data

https://gerrit.wikimedia.org/r/818134

ori mentioned this in T314868: Roll out query parameter normalization.Aug 9 2022, 2:58 PM

ori changed the status of subtask T314868: Roll out query parameter normalization from Open to In Progress.

Change 821731 had a related patch set uploaded (by Ori; author: Ori):

[mediawiki/core@wmf/1.39.0-wmf.23] Support CDN query parameter re-ordering

https://gerrit.wikimedia.org/r/821731

Change 821731 merged by jenkins-bot:

[mediawiki/core@wmf/1.39.0-wmf.23] Support CDN query parameter re-ordering

https://gerrit.wikimedia.org/r/821731

I implemented option 3, and created T314868 for tracking the roll-out.

phuedx unsubscribed.Aug 11 2022, 9:24 AM

Change 822715 had a related patch set uploaded (by Ori; author: Ori):

[operations/puppet@production] varnish::tests: add tests for query-sorting

https://gerrit.wikimedia.org/r/822715

Change 822715 merged by Ori:

[operations/puppet@production] varnish::tests: add tests for query-sorting

https://gerrit.wikimedia.org/r/822715

ori closed subtask T314868: Roll out query parameter normalization as Resolved.Aug 30 2022, 2:33 PM

This is now rolled for text frontends.

It seems like page history caches are not being invalidated properly, which I suspect is related to this change. I filed T317064: History pages' caches not being invalidated after edits for that.

• MZMcBride subscribed.Sep 9 2022, 6:30 PM

	BBlack
	Jun 17 2016, 4:44 PM

Investigate query parameter normalization for MW/servicesClosed, ResolvedPublicActions