For clients like mobile apps and API-driven frontends, it would be useful if RESTBase supported following MediaWiki's #redirects. However, VisualEditor in particular also needs the continued ability to retrieve the redirect page itself, so that it can be edited to change the redirect target, or converted into a regular content page by removing the redirect.
Issues
Poor redirect support in browsers
Browsers have traditionally not exposed redirect information to JavaScript, which lead to some sites relying on this for XSS protection. For this reason, even the newer fetch spec hides the response body from JS, even for same-origin requests with redirect: 'manual' option set.
Newer browsers do expose a [responseURL attribute](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/responseURL), which can be used to detect that a redirect happened. However, older but still supported browsers and IE lack support for this, so relying on it is problematic.
An alternative is to indicate the response location in a Content-Location header. This can be used to detect redirects in all browsers. There were some early issues with Content-Location being used as base href, but I think no browser implements this any more.
Option 1: Status 302 & location header, but return the HTML; rely on client not following redirect
When encountering an internal redirect (same wiki, possibly wikimedia project), RESTBase will respond with a 302 temporary redirect status & a location header pointing to the destination page. It will also return the HTML body & etag as with current responses. In this model, the client receives both redirect information and the body data. This means that it can choose to follow the redirect, or edit the page using the returned body.
At present, the redirect destination is only available inline as a meta tag. This means that we'd need to extract this meta tag from the HTML content, which is certainly quite hacky. In the longer term, it would be desirable for Parsoid to return information about the redirect target in accompanying page metadata (JSON blob). There is some discussion of this in T105845: RFC: Page components / content widgets.
Issue: Browser clients cannot access redirect responses
Sadly, this solution is basically unusable for browser clients, as they can't access redirect responses for security reasons (see Issues above).
Option 2: Status 302 for normal requests, method to avoid redirects (URL param or header), Content-Location header
As in Option 1, but provide a means to avoid being redirected by either
- an URL parameter, for example ?redirect=false, or
- a custom HTTP header, for example Redirect: false.
Option 1) is more discoverable, and does not conflict with CORS restrictions. The downside is that responses would be hard to purge, so we'd disable caching in those cases. However, the request volume with this parameter set is expected to be low, which means that the performance impact should be negligible. With Xkey support in Varnish we could make the no-redirect response cacheable, but hit rates would very likely be low, as most accesses would be to the redirected variant.
For clients like VisualEditor, the request flow would be:
- Perform a regular request & get a cached response.
- Check if Content-Location changed. If it changed, a redirect was followed.
- If the goal is to edit the redirect, repeat request from 1) with ?redirect=false; get an un-cached response.
Alternatively, the first request could specify ?redirect=false, at the cost of foregoing caching in the normal (non-redirect) case.
Option 3: Status 302 for normal requests, URL parameter to request not following temporary redirects
Like Option 2, but with a URL parameter only.
Additionally, implement Varnish logic to
- match the query parameter, strip it & remember it as a flag on the request, and
- if this flag is set, massage 302 responses post-cache by
- replacing the status with 200, and
- stripping the location header.
Advantages
- Simple yet performant: Clients like VisualEditor would always ask for redirects to be disabled, but would still get to share regular cached responses.
- No need to detect that a redirect happened.
- Query strings can be reliably preserved across unrelated redirects (ex: title normalization).
Disadvantages
- Some extra (but generic) Varnish logic.
See also
- T118306: File: pages of images stored on commons result in 404s Similar issue for shared image description pages. Clients like VisualEditor need a
- https://github.com/wikimedia/restbase/pull/365: WIP redirect support based on page renames, but ignoring MediaWiki's #redirects.
- Support for preserving custom request headers seems to be spotty in browsers:
- https://bugzilla.mozilla.org/show_bug.cgi?id=216828#c4 (for all headers, in Firefox)
- https://bugzilla.mozilla.org/show_bug.cgi?id=401564 (for Accept)
- https://fetch.spec.whatwg.org/#http-redirect-fetch: It sounds as if fetch will preserve headers, but this comment suggests otherwise. Best will probably be to test in a few implementations.