Page MenuHomePhabricator

Already prefixed wikilinks may get duplicate prefixes like [[:de:de:…]]
Closed, DeclinedPublic

Description

When the source wiki is the German Wikipedia, and the file description contains a self-link like

[[:de:S-Bahn Mitteldeutschland|S-Bahn Mitteldeutschland]]

… the replacement becomes

[[:w:de:de:S-Bahn Mitteldeutschland|S-Bahn Mitteldeutschland]]

These links work just fine, but clutter the wikitext a bit. Possible solutions and workarounds I can think of at the moment:

  • Before adding any prefix to a link, check if a prefix exists (e.g. de:). If it represents the source wiki itself, remove it.
  • After adding all prefixes, "implode" duplicates. E.g. de:de: becomes de:. Note this is technically a hack as it assumes all prefixes do have the same meaning on (possibly) different wikis. However:
    • This replicates what a user would do.
    • To my knowledge it should work 100% of the time on the Wikimedia cluster.
  • …?

Example diff: https://commons.wikimedia.org/wiki/Special:Diff/364855386

Event Timeline

As this is mainly cosmetic and everything is working as expected (e.g. the link is resolved correctly), we consider this not important for the small-default milestone.

Change 538638 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] [WIP] Avoid duplicating link prefix pointing to source wiki

https://gerrit.wikimedia.org/r/538638

Change 539884 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Fix WikiLinkPrefixer duplicating multi-part prefixes

https://gerrit.wikimedia.org/r/539884

Change 539884 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Fix WikiLinkPrefixer duplicating multi-part prefixes

https://gerrit.wikimedia.org/r/539884

Ok, this is not worth it, esp. since it's really only a cosmetic issue. (To be fair it also wastes a tiny bit of resources when the browser needs to follow a chain of multiple redirects.)

Our code would need to have access to a TitleParser in the context of the source wiki to be able to understand that :de: is a self-link. And that would probably need to be done recursively. What essentially needs to happen is that we first recursively resolve the interwiki link (with possibly multiple hops!) to a clean URL (ideally without any interwiki prefixes left) and then do the reverse and turn the URL back into the shortest possible interwiki link – but this time in the context of the target wiki.

This is one of the many things that's better done by bots that clean the wikitext up after the file was transferred.

Change #538638 merged by jenkins-bot:

[mediawiki/extensions/FileImporter@master] Add test cases with conflicting interwiki/namespace prefixes

https://gerrit.wikimedia.org/r/538638