Page MenuHomePhabricator

Add an API method for resolving links
Open, Needs TriagePublicFeature

Description

Problem:

While a w.wiki link can often be resolved programmatically by following the redirect, it will fail in browser context because w.wiki does not set a suficient CORS policy. Running the following JavaScript from localhost will for example fail:

return fetch('https://w.wiki/37o', {
    method: 'HEAD',
    redirect: 'follow'
}).then(response => {
    return response.url;
});

Proposal

Add a getshorturl API method to the UrlShortener extension for resolving URLs.

Alternatives considered

Adding a permissive CORS policy to w.wiki(fine but more dependent of web-server configuration).

Event Timeline

Adding a permissive CORS policy to w.wiki(fine but more dependent of web-server configuration).

Both could be done, independantly, but that should be filed as a different task.

Change 1005186 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/UrlShortener@master] Add API module to resolve short urls to longer ones

https://gerrit.wikimedia.org/r/1005186

Not actually a duplicate of T228781: Create API to retrieve list of short urls; the patch is more to get a list of all short urls, not resolve specific ones

A query like "action=query&list=shorturl&suurl=https://w.wiki/1" would however only return the URL behind the one given? From my point of view that would resolve the use case.

Is the input here meant to be the full short URL or just the code? If the former, should it support both the normal and alt forms?

I would somehow expect it to be the full URL, as it fits with the mental model of a URL shortener but in practice it probably do little to no difference.

Change 1009853 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/UrlShortener@master] SpecialUrlRedirector: Enable CORS via Access-Control-Allow-Origin header

https://gerrit.wikimedia.org/r/1009853

Is this a discoverable API? Is it faster and/or more cacheable for end-users? Is it easier to maintain?

It seems to me the answer to all three is no. Perhaps addressing the CORS issue would suffice?

Is this a discoverable API? Is it faster and/or more cacheable for end-users? Is it easier to maintain?

It could be more flexible (e.g. accept multiple short URLs in a batch, or accept just the short code on its own, or return a QR code, etc.), but for the use case of single lookups it does seem that the existing API is fine (inspecting the Location header of the response to a request to the normal short URL). It seems that adding Access-Control-Allow-Origin: * is the simplest change to help here — but adding an Action API would be fine too I'd say (it'd be more discoverable, as it'd be listed in ApiSandbox etc.).

Change #1009853 merged by jenkins-bot:

[mediawiki/extensions/UrlShortener@master] SpecialUrlRedirector: Enable CORS via Access-Control-Allow-Origin header

https://gerrit.wikimedia.org/r/1009853

Adding a permissive CORS policy to w.wiki

Note: this will not always receive the exact target of short URL. e.g. if the target is a redirect it will receive the target of the redirect instead of the redirect itself.

Note: this will not always receive the exact target of short URL. e.g. if the target is a redirect it will receive the target of the redirect instead of the redirect itself.

I'm not sure that's correct. UrlShortener just stores the URL as given doesn't it? It doesn't check to see if the URL is a redirect and then fetch the target.

For example, https://en.wikipedia.org/wiki/Stevedore redirects to https://en.wikipedia.org/wiki/Dockworker but the shortened URL of https://w.wiki/9YdJ redirects to the former (i.e. it's a double redirect).

For example https://w.wiki/9Yfs is the shortened URL for https://en.wikipedia.org/wiki/WP:AN. When this code is run (after CORS is enabled):

fetch('https://w.wiki/9Yfs', {
    method: 'HEAD',
    redirect: 'follow'
}).then(response => {
    return response.url;
});

So the returned url is https://en.wikipedia.org/wiki/Wikipedia:AN. However this is not the exact target of short URL, which is http://en.wikipedia.org/wiki/WP:AN

I'm a bit confused as to why https://en.wikipedia.org/wiki/Wikipedia:AN is not a redirect. How does it get redirected?

But anyway, I'm not sure UrlShortener can do much here. I do think adding an API is a reasonable idea, but it'd still only return http://en.wikipedia.org/wiki/WP:AN when https://w.wiki/9Yfs (or 9Yfs or _tVNA) is requested. It seems that this example short URL should've just been https://w.wiki/bFt and avoided the double redirect.

I wouldn't expect the API to interact with actual redirects at all and if it's decided that the CORS-webserver option is better I wouldn't expect that either to effect anything other than the initial redirect, thus an user could still get a CORS error down the chain as intended by the downstream target(s).

Is there anything here actually blocking? The to me it seems like the patch solves the use-case outlined in the task description.

Adding a permissive CORS policy to w.wiki

Note: this will not always receive the exact target of short URL. e.g. if the target is a redirect […]

The reason it appeared to break on indirect redirects is because the above snippet instructs w.wiki to follow redirects, which is wrong.

The reason it appears to work on single redirects, is presumably because you happened to run the JavaScript snippet from the browser console while viewing an article on the same domain as the destination.

[…]

fetch('https://w.wiki/9Yfs', {
    method: 'HEAD',
    redirect: 'follow'
}).then(response => {
    return response.url;
});
  1. To request the redirect, you need to set redirect: 'manual' to disable the native following of redirects. You want to download the redirect, not the Wikipedia article HTML. The latter does not permit CORS access.
  2. The redirect destination will be available as response.headers.get('Location'). The response.url field is something different. It would either return the short URL (which you already had), or the url of the downloaded article (which is denied in most cases).
  3. To allow API access across domains, we use CORS. This was already enabled for w.wiki. But, to use it you need to set mode: 'cors' in the fetch() call. Even though this was left out, it happened to work because of issue 1 and issue 2.

Redirects inside redirects are imho unrelated to this task. Reading the redirect destination should be reliable no matter whether the destination happens to be another redirect. Of course, if follow is enabled, then this becomes a problem due to security restrictions on the target domain. But as long as that is turned off, this isn't a problem.

Either way, it seems preferred to only unwrap w.wiki redirects back to their original long URL. Anything beyond that is outside the juristiction that w.wiki can provide access, and would also introduce cache problems because then the destination is no longer fixed but can externally change at any time. This is true via fetch() and Access-Control-Allow-Origin, but also when building a custom API. Indirect redirects may also be temporary, dynamic, personal, or security-sensitive in nature (e.g. Special:MyLanguage or Special:UserLogin, or Special:MyContributions), and thus should not be expanded or returned from APIs. All of these will work fine as long as we don't try to follow or expand URLs.

Ultimately Short URLs are a key-value database, and all we're doing is returning the value as it was stored. Nothing more, nothing less.

// From https://en.wikipedia.org/wiki/Special:Blankpage
//
// where https://w.wiki/DeCy > https://www.mediawiki.org/wiki/MediaWiki_Introduction_2023
//
fetch('https://w.wiki/DeCy', {
    method: 'HEAD',
    mode: 'cors',
    redirect: 'manual'
}).then(resp => {
    console.log(resp.headers); // Headers{0}
    return resp.headers.get('Location'); // null
});

It turns out that the Fetch API has a special rule for CORS-safelisted response headers. This means that while CORS can provide access to JSON response bodies across domains, in terms of the response headers the list is limited by default, and Location is not on that list.

I wrote a patch locally that adds Access-Control-Expose-Header: Location or Access-Control-Expose-Header: * on the reponse, but that still didn't help. It exposed the next problem instead:

The Fetch API also has a special Atomic HTTP redirect handling exemption for Access-Control-Allow-Origin, which prevents it from even being considered on redirects, even when given full CORS access and full exposure to all headers. This works for all HTTP status codes, except HTTP 30x redirects.

I've raised this in the upstream issue tracker (https://github.com/whatwg/fetch/issues/601) but whatever the outcome there, for the foreseeable future this won't work. The best we could do within the current endpoint is e.g. provide a new request header (e.g. X-No-Redirect: true or something like that), and then when you set that, we can let w.wiki respond with HTTP 200 and a response header like X-Redirect-Target: http…. But.., at that point you might as well add a module to our regular Action API.

All that to say, I originally thought it was benefitial to let the w.wiki service be the standard API for this since the HTTP standards and Web Platform APIs appear to provide for this already in an efficient, scalable, cacheable, and interoperable way. In terms of batching, these would be fine to do concurrently as well. But, while all of this is true and works in server-side languages (e.g. PHP, Python, or Node.js), it does not work today in the browser provides no way to grant access to the Location header, no matter which headers are set, to it seems the only way to make this reliable to day is via an Action API.