Page MenuHomePhabricator

Update output URL in case of URL redirection or canonical URLs
Open, Needs TriagePublicFeature


Feature summary (what you would like to be able to do and where):

We may consider updating the translation output's url value in the following cases:

  • The original URL is redirected to another URL (note we may not always want this, see T210871).
  • The resource provides a canonical URL, different to the one requested.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Right now, we create a target Webpage object and set its url property to the original target URL. We do not update this property in cases of URL redirection or if a canonical URL is available. Then, the output citation's url is set to this target Webpage's url.

Benefits (why should this be implemented?):

Returning the same (redirected or canonical) URL for a resource that may be accessed via different URLs is crucial for things such as understanding how many times a web resource has been cited across Wikimedia projects.

Event Timeline

Alternatively, we may also support defining translation procedures for the URL field.

Such feature could protect the originally intended URL in cases where web servers redirect to intermediate (robot detection, cookie agreement, etc) pages. See for example T310001 and T290834.

However, letting users define procedures for the URL field may become a vulnerability. If an attacker maliciously changes procedures for this field, it may be difficult to detect for users inserting references, and the referenced URL will be lost.

There is a problem with this domain, when we use it in automatic citations it only retrieves the main domain but not anything after.
Here's an example:
I'm using this URL:
When inserting it into the automatic citation it retrieves (both on Citoid and Web2cit) this URL: (see image attached).

Screen Shot 2023-05-20 at 15.41.56.png (756×846 px, 111 KB)