Page MenuHomePhabricator

URLs with no slashes after domain name are "invalid" but are still shortened
Closed, ResolvedPublic

Description

Both https://en.wikipedia.org and https://en.wikipedia.org/ now have their own short URLs. The former generates an error message, but the server shortened it anyway.

I think the extension should automatically add a slash to the end for such URLs. This would both resolve the issue and reduce the number of effective duplicates.

Event Timeline

I just run into the same problem with https://de.wikipedia.org (= https://w.wiki/49) vs. https://de.wikipedia.org/ (= https://w.wiki/Ga).

I support the solution to enforce a slash on all URLs that don't have anything after the top-level domain.

I also run into a related issue in the process. For some reason I was able to submit https://de.wikipedia.org one time, but every later attempt to repeat that is now blocked with "Not a valid URL". I guess I was to fast and submitted the form without the JavaScript being fully loaded. And indeed, I can reproduce the issue when I disable JavaScript. So the next issue is: The frontend attempts to block URLs that are not blocked by the backend.

Change 512615 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/extensions/UrlShortener@master] Normalize plain domains to have a trailing slash after them

https://gerrit.wikimedia.org/r/512615

When the patch is merged and deployed, we will have a surprising amount of entries in the database that is not accessible any more. This is not strictly a problem, just a bit sad.

  • In case both URLs with and without the trailing slash have been submitted before, only the entry with the trailing slash will be used in the future.
    • If the entry with the trailing slash comes first in the database, this is fine.
    • But if the entry without the slash comes first, it's nicer, possibly shorter ID is gone.
  • In case only the URL without the slash was submitted, it's ID is gone.

I suggest to:

  • Update the database and add the slash to all URLs that are still unique after doing this.
  • If both URLs are already in the database, but the one without the slash comes first, swap the two.

I'm not sure if this is worth it, or if it is even possible.

I'm not sure if this is worth it, or if it is even possible.

I suppose it's possible, we would need to write a script for it, test the script, and then run it. I don't think that's worth it, but I'm not going to stop anyone from working on it if they want to.

Change 512615 merged by jenkins-bot:
[mediawiki/extensions/UrlShortener@master] Normalize plain domains to have a trailing slash after them

https://gerrit.wikimedia.org/r/512615