Page MenuHomePhabricator

Update wikilinks to point back to source project
Closed, ResolvedPublic3 Story Points

Description

Task

Motivation
When moving files, all local links either become red links or point to different pages. However, we want to be able to persist where links lead to, too.

Acceptance Criteria

  • For all links with namespaces that are not the category namespace, link to the source wiki namespace (as this is where the link was supposed to link to). EG: [[1st Security Force Assistance Brigade]] should be changed to [[:en:1st Security Force Assistance Brigade|1st Security Force Assistance Brigade]]
  • The category namespace should stay pointing to the target wiki, as we expect the categories now to be used on the new wiki

Original Report

I've been playing around with the FileImporter tool a bit, and noticed that it causes lots of redlinks because it doesn't update wikilinks to point to the originating wiki. This can be fixed manually, but it would make sense for the tool to do this automatically. For example, I imported the file :File:1st Security Force Assistance Brigade Flash.svg, and had to manually change [[1st Security Force Assistance Brigade]] to [[:en:1st Security Force Assistance Brigade|1st Security Force Assistance Brigade]], [[United States military beret flash|unit flash]] to [[:en:United States military beret flash|unit flash]], and [[User:McChizzle]] to {{User at project|McChizzle|w|en}}.

It seems like it should be a fairly easy task to add the originating wiki's interwiki code and to pipe wikilinks that now need piping to hide the code. Detecting usernames and using {{User at project}} would be useful too.

Bonus points if the code is smart enough to only do it to redlinks.

Related Objects

Mentioned In
T225083: Issues when cleaning up wikilinks that point to the target wiki
rEFLI1cf4d3eb456e: Introduce WikiLinkParserFactory to process wikitext links
rEFLI333e3ae0f2f7: Introduce WikiLinkParserFactory to process wikitext links
rEFLIab0c4895fa16: Introduce WikiLinkParserFactory to process wikitext links
rEFLIae722b681336: Introduce WikiLinkParserFactory to process wikitext links
T224122: 20% maintenance tasks in QWERTY sprint 2019-05-15
rEFLI15101b447061: Introduce WikiLinkParserFactory to process wikitext links
rEFLI62ff552ff87b: Move license checks and wikitext cleanup to ImportPlanValidator
rEFLI0aa7a91d5e31: Make Media: namespace behave identical to File: namespace
T223290: Let File/Media links point back to the source wiki when they don't exist on Commons
rEFLI37277a130563: Introduce WikiLinkPrefixer
rEFLIaed15bca93e5: Introduce WikiLinkPrefixer
rEFLI09782366144a: Introduce WikiLinkPrefixer
rEFLIa5fe417e6cbc: Introduce WikiLinkPrefixer
rEFLI8de9bbc0e2c0: Introduce WikiLinkPrefixer
rEFLIc2d47fed7bce: Introduce WikiLinkCleaner to be used instead of callables
rEFLIe459673decab: Introduce WikiLinkPrefixer
rEFLI0cce3fd61082: Introduce WikiLinkParser
rEFLI423df1dc0018: Introduce WikiLinkCleaner to be used instead of callables
rEFLIadcf901c23ee: Introduce WikiLinkCleaner interface in favor of callables
rEFLIa20ead3a0d48: Introduce WikiLinkParser
rEFLI065b7327382a: Fix wikitext template parser failing on unballanced brackets
T213821: On the file page change localized namespace to target wiki namespace names
Mentioned Here
T225083: Issues when cleaning up wikilinks that point to the target wiki
T213821: On the file page change localized namespace to target wiki namespace names

Event Timeline

Ahecht created this task.Jul 2 2018, 3:37 AM
Lea_WMDE triaged this task as Normal priority.Jul 23 2018, 3:33 AM
Lea_WMDE updated the task description. (Show Details)

As from some thoughts in the todays story time:

For links to local wiki pages, add the corresponding interwiki description. I.e. [[1st Security Force Assistance Brigade]] should be changed to [[:en:1st Security Force Assistance Brigade|1st Security Force Assistance Brigade]]

This could probably be done with RegEx, to keep it as simple as possible. Fancy complicated cases like templates messing with links would be excluded then.

Links to user pages should link to the Commons accounts, if they exist, this can just stay a local link

Parsing the user links could be a bit more complicated than using easy RegEx, we probably would have to use the actual parser for this. That could lead to a rabbit hole of things attached to that. - We should probably discuss this in a bigger round or create investigation tickets to look into it in detail. @Lea_WMDE

Replying to this from the original report: "Bonus points if the code is smart enough to only do it to redlinks."

I disagree that it should be the default behavior. Suppose Brian uploads a photo to English Wikipedia, including a link to his English Wikipedia user page as part of the attribution. Brian also has a user page on Commons, but the content is different. The attribution requirement of CC licenses (as well as common courtesy) would have us respect the initial preference expressed.

While there may be cases in which it's preferable to change the link to point to the Commons user page, I believe those cases will be the exception, not the rule. Those decisions should be made by a human's conscious decision, not by algorithm default.

This doesn't need to be complicated. Just change all local links to point to the source wiki instead (as Pete suggests above). Anything else is unnecessary overkill.

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMar 15 2019, 12:14 AM
Lea_WMDE updated the task description. (Show Details)Mar 26 2019, 2:47 PM
Lea_WMDE set the point value for this task to 8.

Change 507971 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkParser

https://gerrit.wikimedia.org/r/507971

Change 507973 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkCleaner interface in favor of callables

https://gerrit.wikimedia.org/r/507973

FYI: I started a framework for this as well as T213821, utilizing the visitor pattern. The idea is to have two classes that implement WikiLinkCleaner, one for each task.

The two should be mostly independent from each other. The only restriction I see is the requirement to skip the category namespace. This is probably easier when the namespaces are all de-localized first via T213821.

thiemowmde moved this task from Sprint Backlog to Doing on the WMDE-QWERTY-Sprint-2019-04-30 board.

Change 508332 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkPrefixer

https://gerrit.wikimedia.org/r/508332

Change 507971 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkParser

https://gerrit.wikimedia.org/r/507971

Change 507973 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkCleaner to be used instead of callables

https://gerrit.wikimedia.org/r/507973

Change 508332 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkPrefixer

https://gerrit.wikimedia.org/r/508332

Change 510436 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Make Media: namespace behave identical to File: namespace

https://gerrit.wikimedia.org/r/510436

thiemowmde changed the point value for this task from 8 to 3.May 15 2019, 2:08 PM

Change 510436 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Make Media: namespace behave identical to File: namespace

https://gerrit.wikimedia.org/r/510436

Change 510589 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Move license checks and wikitext cleanup to ImportPlanValidator

https://gerrit.wikimedia.org/r/510589

Change 510589 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Move license checks and wikitext cleanup to ImportPlanValidator

https://gerrit.wikimedia.org/r/510589

Change 509345 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkParserFactory to process wikitext links

https://gerrit.wikimedia.org/r/509345

Change 509345 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Introduce WikiLinkParserFactory to process wikitext links

https://gerrit.wikimedia.org/r/509345

Lea_WMDE closed this task as Resolved.Mon, May 27, 1:17 PM
Lea_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2019-05-15 board.
Pikne added a subscriber: Pikne.EditedWed, Jun 5, 10:18 AM

Handling of commons: prefix is not right, here it becomes :et:Project:.

Edit: I also checked its equivalent prefix c: (e.g. used here), which isn't turned into broken link prefix.

That's a fascinating edge-case, thanks for reporting! I created T225083 to take care of it.