Support following MediaWiki redirects when retrieving HTML revisions
Closed, ResolvedPublic0 Estimated Story Points
Actions

Assigned To

Authored By

	• GWicke
	Nov 13 2015, 6:14 AM

Description

For clients like mobile apps and API-driven frontends, it would be useful if RESTBase supported following MediaWiki's #redirects. However, VisualEditor in particular also needs the continued ability to retrieve the redirect page itself, so that it can be edited to change the redirect target, or converted into a regular content page by removing the redirect.

Issues

Poor redirect support in browsers

Browsers have traditionally not exposed redirect information to JavaScript, which lead to some sites relying on this for XSS protection. For this reason, even the newer fetch spec hides the response body from JS, even for same-origin requests with redirect: 'manual' option set.

Newer browsers do expose a [responseURL attribute](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/responseURL), which can be used to detect that a redirect happened. However, older but still supported browsers and IE lack support for this, so relying on it is problematic.

An alternative is to indicate the response location in a Content-Location header. This can be used to detect redirects in all browsers. There were some early issues with Content-Location being used as base href, but I think no browser implements this any more.

Option 1: Status 302 & location header, but return the HTML; rely on client not following redirect

When encountering an internal redirect (same wiki, possibly wikimedia project), RESTBase will respond with a 302 temporary redirect status & a location header pointing to the destination page. It will also return the HTML body & etag as with current responses. In this model, the client receives both redirect information and the body data. This means that it can choose to follow the redirect, or edit the page using the returned body.

At present, the redirect destination is only available inline as a meta tag. This means that we'd need to extract this meta tag from the HTML content, which is certainly quite hacky. In the longer term, it would be desirable for Parsoid to return information about the redirect target in accompanying page metadata (JSON blob). There is some discussion of this in T105845: RFC: Page components / content widgets.

Issue: Browser clients cannot access redirect responses

Sadly, this solution is basically unusable for browser clients, as they can't access redirect responses for security reasons (see Issues above).

Option 2: Status 302 for normal requests, method to avoid redirects (URL param or header), `Content-Location` header

As in Option 1, but provide a means to avoid being redirected by either

an URL parameter, for example ?redirect=false, or
a custom HTTP header, for example Redirect: false.

Option 1) is more discoverable, and does not conflict with CORS restrictions. The downside is that responses would be hard to purge, so we'd disable caching in those cases. However, the request volume with this parameter set is expected to be low, which means that the performance impact should be negligible. With Xkey support in Varnish we could make the no-redirect response cacheable, but hit rates would very likely be low, as most accesses would be to the redirected variant.

For clients like VisualEditor, the request flow would be:

Perform a regular request & get a cached response.
Check if Content-Location changed. If it changed, a redirect was followed.
If the goal is to edit the redirect, repeat request from 1) with ?redirect=false; get an un-cached response.

Alternatively, the first request could specify ?redirect=false, at the cost of foregoing caching in the normal (non-redirect) case.

Option 3: Status 302 for normal requests, URL parameter to request not following temporary redirects

Like Option 2, but with a URL parameter only.

Additionally, implement Varnish logic to

match the query parameter, strip it & remember it as a flag on the request, and
if this flag is set, massage 302 responses post-cache by
- replacing the status with 200, and
- stripping the location header.

Advantages

Simple yet performant: Clients like VisualEditor would always ask for redirects to be disabled, but would still get to share regular cached responses.
No need to detect that a redirect happened.
Query strings can be reliably preserved across unrelated redirects (ex: title normalization).

Disadvantages

Some extra (but generic) Varnish logic.

Related Objects
Search...

Status	Assigned	Task
Resolved	• Mholloway	T124193 [Bug] Clicking images on Main Page with content service enabled throws 404 rather than opening gallery
Resolved	• bearND	T130079 Cannot open file page from Commons image
Resolved	• Pchelolo	T118306 File: pages of images stored on commons result in 404s
Resolved	• AlexMonk-WMF	T130757 Deal with RB responding with a 302 rather than a 404 for non-extant File: pages from foreign file repos
Invalid	None	T130758 Follow RB 301s (used for requests in non-canonical form)
Resolved	• bearND	T131570 Audit MediaWiki redirect handling in summary, mobileapps, definition end points
Resolved	• bearND	T132507 Move redirect handling from mobile service to RESTBase
Resolved	• Pchelolo	T118548 Support following MediaWiki redirects when retrieving HTML revisions
Resolved	• GWicke	T134464 Make RB ?redirect=false cache-efficient

Event Timeline

• GWicke created this task.Nov 13 2015, 6:14 AM

• GWicke raised the priority of this task from to Medium.

• GWicke updated the task description. (Show Details)

• GWicke added a project: RESTBase.

• GWicke added subscribers: • GWicke, • Pchelolo, • mobrovac and 3 others.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 13 2015, 6:14 AM

• GWicke raised the priority of this task from Medium to High.Nov 17 2015, 5:01 AM

• GWicke set Security to None.

• GWicke added a project: Services-next.Jan 29 2016, 4:42 PM

I think for clients is good to obtain always a response, and flag it as redirected or not redirected in JSON.

I think a most common API is action=query to fetch the article.
In this case I would like to have the redirected handled by default, and return a JSON like (pseudo-code here, not json syntax):
[query][pages]:{ [7955] : {[page] : {DNA html is here}, [redirect] : false }
[query][pages]:{ [7956] : {[page] : {DNA html is here}, [redirect] : true }
if not redirect is specified:
[query][pages]:{ [7956] : {[page] : {Dna is redirect}, [redirect] : false } }, [redirects] : { [7956] : { Final redirected Page goes here, or alternatively a whole list of redirect pointing to final redirect } }

@Gg4u, I don't think we want to wrap all HTML responses in JSON. This would be a drastic change, and would introduce some non-trivial overheads. The proposal is to use regular HTTP semantics by sending a HTTP redirect, along with the regular response body. This means that the full HTML of redirects *is* included.

Clients can decide whether to follow the redirect automatically or not, by setting the corresponding flag in curl / fetch / urllib / $HTTP_LIB. Many will want to, but others like VisualEditor will disable automatic following to edit the redirect itself.

• GWicke updated the task description. (Show Details)Mar 25 2016, 6:07 AM

• GWicke updated the task description. (Show Details)Apr 1 2016, 8:25 PM

• GWicke added projects: Mobile-Content-Service, Parsoid, VisualEditor.Apr 1 2016, 8:30 PM

• GWicke edited subscribers, added: Esanders, Catrope, Arlolra, cscott; removed: Aklapper.

• GWicke added a subscriber: • Mholloway.

• GWicke added subscribers: • AlexMonk-WMF, Jdforrester-WMF.

• GWicke mentioned this in T118306: File: pages of images stored on commons result in 404s.Apr 1 2016, 8:40 PM

• GWicke updated the task description. (Show Details)Apr 1 2016, 11:43 PM

• GWicke added a subscriber: BBlack.

• GWicke updated the task description. (Show Details)Apr 1 2016, 11:54 PM

Jdforrester-WMF added a parent task: T130757: Deal with RB responding with a 302 rather than a 404 for non-extant File: pages from foreign file repos.Apr 3 2016, 8:12 AM

Jdforrester-WMF added a parent task: T130758: Follow RB 301s (used for requests in non-canonical form).

Jdforrester-WMF moved this task from To Triage to TR0: Interrupt on the VisualEditor board.

• GWicke added a parent task: T131570: Audit MediaWiki redirect handling in summary, mobileapps, definition end points.Apr 4 2016, 11:27 PM

Some implementation notes based on a conversation @Pchelolo & I had in the office earlier today:

The MediaWiki API already returns a boolean 'redirect' flag indicating that the *latest* revision of this title is a redirect (it's only stored in the page table). We already store this, so can use it to find redirects in *current* revisions.
- We can't really trust this for old revisions, so might need to check the content for now.
We need to add a per-revision redirect boolean in the upcoming restrictions table.
We discussed whether we should redirect requests asking for a specific revision as well. On balance, it seems more consistent to do so (and support ?redirect=no).

• bearND moved this task from Incoming to Tracking on the Mobile-Content-Service board.Apr 5 2016, 6:37 AM

@BBlack, what is your take on the Varnish part of option 3? Our goal is to share cache entries between redirected responses (302, location header, body) & no-redirect variants (200, no location header, body). Concretely, we would do something like the following for requests to /api/rest_v1/:

Match and strip ?redirect=no in the URL; set a no_redirect flag on the request.
In the response (cached or not) path, if status 302 and no_redirect flag is set:
- Replace status with 200
- Strip Location header

Does this sound sane to you?

• GWicke added a project: Traffic.Apr 5 2016, 9:50 PM

Restricted Application added a project: SRE. · View Herald TranscriptApr 5 2016, 9:50 PM

• GWicke mentioned this in T131570: Audit MediaWiki redirect handling in summary, mobileapps, definition end points.Apr 7 2016, 11:44 PM

• Pchelolo added a parent task: T132507: Move redirect handling from mobile service to RESTBase.Apr 12 2016, 8:59 PM

I hate it, but I don't see any better way to solve the problem at present.

To reword to be sure I have the intent:

On the vcl_recv side, for /api/rest_v1 requests, strip and note (via req.http.Foo as an internal request flag) the ?redirect=no argument. We'll need to support that in all its forms without screwing up other arguments - we've done that before elsewhere in VCL. I think the best example is in templates/varnish/analytics.inc.vcl.erb, in function analytics_provenance_recv_ where we strip the wprov argument from any position in the argument list by using a pair of regexen.

On the vcl_deliver side, if the flag from above is set && status == 302, we'll overwrite the status as "200 OK" and delete the Location header.

@BBlack: Yes, that captures the idea very well.

I'm also not too fond of the VCL part of this, but couldn't come up with anything better.

We'll initially deploy this without caching for the ?redirect=no requests, so there is no rush to implement the VCL bits. We'll probably look into that later this quarter.

• MZMcBride subscribed.Apr 15 2016, 3:55 AM

In T118548#2201380, @GWicke wrote:

We'll initially deploy this without caching for the ?redirect=no requests, so there is no rush to implement the VCL bits. We'll probably look into that later this quarter.

The mailing list post mentioned ?redirect=false. Will any falsey value work with this parameter?

I have some vague memory that the value of some URL parameters, when passed to MediaWiki's api.php, will change the response behavior regardless of the parameter's value. The URL parameter being present in the request URL is all that's needed, as I recall. I'm wondering if matching that behavior here would be great or terrible.

In T118548#2209015, @MZMcBride wrote:

The mailing list post mentioned ?redirect=false. Will any falsey value work with this parameter?

We've added redirect=no support to RESTBase to match MW syntax about redirects, but basically it's a false.

In RESTBase false is a string false, FALSE, 1 and now no. However, we don't want to advertise all of these options, as for caching it's better to have just one, so we will probably stick with false in all the documentation/mailing lists/etc.

In MW api the parameters which just need to be present have a semantics of true with a false default, however here we have the opposite, so it's better to be explicit here.

In T118548#2209015, @MZMcBride wrote:

I have some vague memory that the value of some URL parameters, when passed to MediaWiki's api.php, will change the response behavior regardless of the parameter's value. The URL parameter being present in the request URL is all that's needed, as I recall. I'm wondering if matching that behavior here would be great or terrible.

Right... from https://en.wikipedia.org/w/api.php#main.2Fdatatypes:

Boolean parameters work like HTML checkboxes: if the parameter is specified, regardless of value, it is considered true. For a false value, omit the parameter entirely.

What happens with ?redirect=yes or other truthy values?

What happens with ?redirect=yes or other truthy values?

It's considered to be true and the redirect happens. However here redirect is the default behaviour, so there's no point to be explicit.

I guess the question remains, though: if the Varnish redirect-stripper sees ?redirect with a non-falsey value, should it still strip the parameter?

In T118548#2209336, @BBlack wrote:

I guess the question remains, though: if the Varnish redirect-stripper sees ?redirect with a non-falsey value, should it still strip the parameter?

Yes. We care only about the false-y case.

The new redirect handling code has been deployed in production.

BBlack mentioned this in T134464: Make RB ?redirect=false cache-efficient.May 5 2016, 4:11 AM

BBlack created subtask T134464: Make RB ?redirect=false cache-efficient.

• GWicke closed subtask T134464: Make RB ?redirect=false cache-efficient as Resolved.May 9 2016, 7:13 PM

Jdforrester-WMF set the point value for this task to 0.May 10 2016, 1:16 AM

• GWicke mentioned this in T53736: Consider changing wikipage redirects to be proper HTTP redirects.Jun 14 2017, 8:21 PM

• GWicke mentioned this in T148645: Content service doesn't handle URL fragments when redirecting.Aug 7 2017, 5:41 PM

Support following MediaWiki redirects when retrieving HTML revisionsClosed, ResolvedPublic0 Estimated Story PointsActions