I think the extension should automatically add a slash to the end for such URLs. This would both resolve the issue and reduce the number of effective duplicates.
|mediawiki/extensions/UrlShortener : master||Normalize plain domains to have a trailing slash after them|
|Open||None||T108602 Equivalent URLs are not canonicalized and deduplicated|
|Resolved||Legoktm||T220718 URLs with no slashes after domain name are "invalid" but are still shortened|
I support the solution to enforce a slash on all URLs that don't have anything after the top-level domain.
When the patch is merged and deployed, we will have a surprising amount of entries in the database that is not accessible any more. This is not strictly a problem, just a bit sad.
- In case both URLs with and without the trailing slash have been submitted before, only the entry with the trailing slash will be used in the future.
- If the entry with the trailing slash comes first in the database, this is fine.
- But if the entry without the slash comes first, it's nicer, possibly shorter ID is gone.
- In case only the URL without the slash was submitted, it's ID is gone.
I suggest to:
- Update the database and add the slash to all URLs that are still unique after doing this.
- If both URLs are already in the database, but the one without the slash comes first, swap the two.
I'm not sure if this is worth it, or if it is even possible.
I suppose it's possible, we would need to write a script for it, test the script, and then run it. I don't think that's worth it, but I'm not going to stop anyone from working on it if they want to.