Page MenuHomePhabricator

Issues when cleaning up wikilinks that point to the target wiki
Open, Needs TriagePublic3 Story Points

Description

In T198584#5236282 @Pikne reported a fascinating edge-case where the link [[commons:User:x]] became [[:et:Project:User:x]].

  • On the source wiki, the commons: prefix is an interwiki prefix that points to Commons.
  • On Commons, there is no commons: interwiki prefix, because – well – we are already on Commons. But there is a Commons: namespace. That's the local project namespace that is called Wikipedia: and such in other projects.
  • Therefor, the prefix commons: is detected to be a namespace, and de-localized to Project:.
  • Additionally, the :et: interwiki prefix is added to point back to the source wiki.

Issues with that, and possible ways forward:

  • A trivial workaround might be to make the project namespace a special case, and always exclude it from the namespace de-localization.
  • Another (fragile!) workaround might be to assume the target wikis project namespace is identical to the interwiki prefix that represents the target wiki on the source wiki. In the example, the commons: interwiki prefix is identical to the namespace Commons: on Commons. If this is the case, we just remove it.
  • Another fix might be to check if a link contains more than one colon, and do not do any namespace de-localization on the first part then. That would certainly be wrong, because in a link like [[A:B:C]] only the B can be a namespace, but the A can't. Note that this fix does not cover links to the main namespace. These would still break.
  • We could teach the wikitext cleaner about interwiki prefixes as they are on the source wiki. Fetching this information is technically possible. An open question is what to do with this information. Should [[commons:x]] become [[:et:commons:x]] or [[x]]? And is the use-case of "links pointing to the main namespace of the target wiki" even relevant?

Demo instructions

  • Import a test file with specially prepared links.
  • In the import preview, the enwiki link should point to the "Wikipedia" namespace, and the commons link to the "Commons" namespace.
  • The user link should still read "Benutzer".
  • Now, import from beta dewiki
  • This file should still read "Commons:", however the user link will be translated to the "User" namespace.

Event Timeline

Change 514465 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Fix NamespaceUnlocalizer messing with interwiki prefixes

https://gerrit.wikimedia.org/r/514465

Change 514465 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Fix NamespaceUnlocalizer messing with interwiki prefixes

https://gerrit.wikimedia.org/r/514465

Tobi_WMDE_SW added a subscriber: Tobi_WMDE_SW.

We're having a look whether the existing and merged patches fixed the bug, or we fix it in the WMDE-QWERTY-Sprint-2019-09-04 otherwise.

Tobi_WMDE_SW set the point value for this task to 3.

Agreed to invest 3 SP into finding out whether the existing fix is sufficient.

thiemowmde updated the task description. (Show Details)Wed, Sep 11, 2:29 PM
thiemowmde moved this task from Sprint Backlog to Doing on the WMDE-QWERTY-Sprint-2019-09-10 board.

Suggestion #3 is already implemented via https://gerrit.wikimedia.org/r/514465. I guess this solves 99% of the issue, except for links that point to the main namespace. The main namespace on Commons is used for galleries. Links to galleries are rare.

I realized it never makes sense to normalize the project namespace. Let's say a link points to [[:en:Wikipedia:MOS]]. The purpose of such a link is to point to exactly this page in exactly this namespace, not to another project namespace somewhere else.

The project namespace is a special case. It's not translated, but configured to mirror the project name. We should not "undo" this configuration like we undo translations.

Change 536017 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Exclude the Project namespace from unlocalization

https://gerrit.wikimedia.org/r/536017

thiemowmde removed thiemowmde as the assignee of this task.Thu, Sep 12, 8:39 AM
thiemowmde moved this task from Doing to Review on the WMDE-QWERTY-Sprint-2019-09-10 board.

Change 536017 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Exclude the Project namespace from unlocalization

https://gerrit.wikimedia.org/r/536017

awight assigned this task to thiemowmde.Fri, Sep 13, 6:57 AM
awight moved this task from Review to Demo on the WMDE-QWERTY-Sprint-2019-09-10 board.
awight updated the task description. (Show Details)Fri, Sep 13, 1:03 PM