Page MenuHomePhabricator

Correctly link pages from non-english non-Wikipedia pages
Closed, ResolvedPublic2 Story Points

Description

Motivation
As of June 05 2019 interwiki linking for non-en-wikis of non-Wikipedias (e.g. de.wikivoyage, it.wikiversity) is not working yet. What happens instead, is that the link is interpreted as a link to the language's Wikipedia. This affects the file page itself, but not the history.
As an example, on de.Wikivoyage there are 669 files on de.wikivoyage.org, out of which 197 contain links to other pages of the wiki.

Currently, not all sister projects are set up to use FileImporter (see the existing config files here)

Acceptance Criteria

  • Also non-en-wikis of non-Wikipedias get correctly prefixed.

Dev notes
Probably we could get the interwiki links from mediawiki, see also previous work and investigations:

After this is solved we could also get rid of the articifical config that whitelisted how to resolve specific paths

Event Timeline

Lea_WMDE updated the task description. (Show Details)Jun 18 2019, 1:29 PM
Lea_WMDE set the point value for this task to 8.
awight added a subscriber: awight.Jun 25 2019, 8:55 AM

One thing to try: parse the file info into HTML using restbase on the source wiki, then transform back into wikitext using commonswiki restbase.

awight added a comment.EditedJul 1 2019, 8:59 AM

Before I burn too much more time looking for this in the wild, does anyone have examples of image file info using multi-prefix interwiki links?

Take a look here. I think on de.wikivoyage, an example was for example an image of Berlin that linked to the wikivoyage article of Berlin

Following up on my comment above, I was mistaken about the task. Now I understand that we need to prefix *any* links, including plain (non-interwiki) wiki links to pages on the same site.

Andrew-WMDE moved this task from Sprint Backlog to Doing on the WMDE-QWERTY-Sprint-2019-06-26 board.

Change 520197 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[mediawiki/extensions/FileImporter@master] [WIP] Correctly link pages from non-english non-Wikipedia pages

https://gerrit.wikimedia.org/r/520197

awight added a comment.Jul 5 2019, 9:42 AM

When we get to the demo, here's a nice edge case which leads to a functional stack of four interwiki prefixes: https://fr.wikisource.org/wiki/Fichier:Alfred_Adler.png . For example, https://test.wikipedia.beta.wmflabs.org/wiki/Special:ImportFile?clientUrl=https://fr.wikisource.org/wiki/Fichier:Alfred_Adler.png .

I'll have to see whether test.beta can do all the interwiki things, we might need to adjust.

Change 520197 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Correctly prefix links to non-english non-Wikipedia sites

https://gerrit.wikimedia.org/r/520197

Change 521270 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Handle interwiki edge case

https://gerrit.wikimedia.org/r/521270

Change 521270 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Handle interwiki edge case

https://gerrit.wikimedia.org/r/521270

Change 521448 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Make fewer assumptions about subdomain structure

https://gerrit.wikimedia.org/r/521448

Change 521448 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Make fewer assumptions about subdomain structure

https://gerrit.wikimedia.org/r/521448

Change 521472 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Strip the "$1" before scraping for the API URL

https://gerrit.wikimedia.org/r/521472

WMDE-Fisch changed the point value for this task from 8 to 2.

Change 521472 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Strip the "$1" before scraping for the API URL

https://gerrit.wikimedia.org/r/521472

Lea_WMDE closed this task as Resolved.Jul 25 2019, 1:51 PM
Lea_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2019-06-26 board.