Page MenuHomePhabricator

Encoding discrepancy in https://wikipedia.org redirects
Open, LowPublicBUG REPORT

Description

hello! found an encoding issue for images with a question-mark in the Redirect API.
The behaviour seems to be change when a subdomain is added to the request.

Steps to Reproduce:

I'm not sure why the encoding behaviour changes when a subdomain is added. Encoding on this endpoint behaves normally, otherwise.
Thanks

Event Timeline

Reedy subscribed.
$ curl -I -L "https://wikipedia.org/wiki/Special:Redirect/file/(61-365)_Can_you_imagine%3F_(5320329773).jpg" | grep location
location: https://www.wikipedia.org/wiki/Special:Redirect/file/(61-365)_Can_you_imagine?_(5320329773).jpg
location: https://en.wikipedia.org/wiki/Special:Redirect/file/(61-365)_Can_you_imagine?_(5320329773).jpg

You also didn't mention how it shows up on the url; that it's taken after the ? as query params etc

Screenshot 2020-07-09 at 19.45.08.png (664×2 px, 129 KB)

I'm guess it's an issue in the rewrites and how they deal with the query string

Reedy renamed this task from Encoding discrepancy in Special:Redirect API to Encoding discrepancy in https://wikipedia.org redirects.Jul 9 2020, 6:48 PM

I'm not sure the problem is what you think it is. wikipedia.org isn't even supposed to respond to such requests IIRC. I'll dig deeper.

The problem presents even if we start with www.wikipedia.org:

$ curl -I -L "https://www.wikipedia.org/wiki/Special:Redirect/file/(61-365)_Can_you_imagine%3F_(5320329773).jpg" | grep location
location: https://en.wikipedia.org/wiki/Special:Redirect/file/(61-365)_Can_you_imagine?_(5320329773).jpg

This is due to how the RewriteRule that causes this redirect is written:

RewriteRule ^/(upload|wiki|stats|w)/(.*)$ %{ENV:RW_PROTO}://en.wikipedia.org/$1/$2 [R=301,L]

The rewrite engine by default works on *already decoded* urls. So the only way to support redirecting urls with encoded question marks cleanly seems to be to do something like

RewriteCond %{THE_REQUEST} /(upload|wiki|stats|w)/(.*)\ HTTP/1.1
RewriteRule . https://en.wikipedia.org/%1/%2 [NE,R=301,L]

but using THE_REQUEST bypasses a lot of useful escaping and could lead to unintended consequences. We could just go with a non-rewriterule approach, for instance:

Redirect /wiki https://en.wikipedia.org/wiki

has the effect we desired.

Joe triaged this task as Low priority.Jul 21 2020, 2:27 PM
Joe added a project: serviceops.