Page MenuHomePhabricator

[Bug] Parsed summary for Wikidata changes has broken links
Closed, ResolvedPublic

Description

Examples from https://de.wikipedia.org/wiki/Spezial:API-Spielwiese#action=query&list=recentchanges&format=json&rcprop=comment|parsedcomment&rclimit=max&rctype=external:

<a href=\"/wiki/Wandrille_Lef%C3%A8vre#wbsetclaimvalue:1.7C\" title=\"Wandrille Lef\u00e8vre\">\u2192</a>\u200e<span dir=\"auto\"><span class=\"autocomment\">wbsetclaimvalue:1|: </span> <a href=\"/w/index.php?title=Property:P735&amp;action=edit&amp;redlink=1\" class=\"new\" title=\"Property:P735 (Seite nicht vorhanden)\">Property:P735</a>: <a href=\"/w/index.php?title=Q12788459&amp;action=edit&amp;redlink=1\" class=\"new\" title=\"Q12788459 (Seite nicht vorhanden)\">Q12788459</a></span>

Note that the links to properties and items are broken local links, not correct links to Wikidata.

Sprachlink hinzugef\u00fcgt: <a href=\"/w/index.php?title=Specieswiki:Pogonieae&amp;action=edit&amp;redlink=1\" class=\"new\" title=\"Specieswiki:Pogonieae (Seite nicht vorhanden)\">specieswiki:Pogonieae</a>

This uses a wrong prefix specieswiki, which creates a broken local link instead of a link to the correct wiki. (Apart from the fact that "Sprachlink hinzugefügt" is wrong, as the link isn't a language link, but a link to a sister project.)

Event Timeline

Schnark raised the priority of this task from to Needs Triage.
Schnark updated the task description. (Show Details)
Schnark subscribed.
Tobi_WMDE_SW renamed this task from Parsed summary for Wikidata changes has broken links to [Bug] Parsed summary for Wikidata changes has broken links.Nov 3 2015, 2:00 PM
Tobi_WMDE_SW moved this task from Proposed to Backlog on the Wikidata-Sprint-2015-11-03 board.

Problem is that we try to get the interwiki id from the SiteStore, but only wikis in one's own site group have these interwiki ids set. If an interwiki id is not found, then we assume the site id is the interwiki id which doesn't get resolved properly as a link.

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/874f5e0139cbd4c24443af1ab41adac0b1e4b785/client/includes/SiteLinkCommentCreator.php#L249-L262

Current situation:

We don't have a nice way to populate interwiki ids reliably for sister projects. (e.g. linking "voy:it:Berlin" from English Wikipedia)

Places that know about the site group prefixes include SiteMatrix and dumpInterwiki.php Maybe we can add such mapping in core, as a (optional) setting. Then look at a site's site group, then find the group prefix (if applicable). Then, it needs the "site language" code appended to fully construct the link.

We then have sites like Wikispecies which in some places is considered in the "wiki" (wikipedia) group and "species" would be the subdomain / interwiki / "language code". A link from, say wikivoyage might be "w:species:Plasmodium_falciparum".

Then we have sites like Wikidata that has 'd' as the interwiki prefix and no subdomain.

Possible solution:

In Wikibase, we have the 'languageLinkSiteGroup', so it tells us that commons is regarded as a 'wikipedia' when it comes to interwiki links. We could have a similar setting for site group interwiki prefix. (wikidata would have to not be a wikipedia here)

$wgSiteGroupToInterwiki = array( 'wikivoyage' => 'voy', ...)

Then we probably want a setting for special case prefixes (e.g. Wikidata), which should be checked first before checking for group prefix, and use that. If a site is not special cased here, then interwiki prefix is site group prefix + "language code" / subdomain as we have in Site objects.

$wgSiteIdToInterwiki = array( 'wikidatawiki' => 'd', ...)

It would be nicest, imho, if the setting(s) were in core. Right now, knowledge of these site group prefixes is in SiteMatrix and dumpInterwiki and we also have $wgLocalInterwiki.

Then we could have InterwikiLinker or such code in core that constructs an interwiki wikitext link, based on site id / wiki id (are they the same?) + page title + fragment.

Short term solution:

If we don't find an interwiki id from a Site object, then don't format the link as a wikitext link and leave it unlinked.

Change 250999 had a related patch set uploaded (by Aude):
Don't make broken wikitext links for sister project links

https://gerrit.wikimedia.org/r/250999

aude moved this task from Doing to Review on the Wikidata-Sprint-2015-11-03 board.

created T117738 to add a mechanism for formatting interwiki links to sister projects.

for now, the short term solution is simply not to include broken links and leave these unlinked if we don't know the interwiki prefix for a site.

Change 251239 had a related patch set uploaded (by Hoo man):
Don't make broken wikitext links for sister project links

https://gerrit.wikimedia.org/r/251239

Change 251239 merged by jenkins-bot:
Don't make broken wikitext links for sister project links

https://gerrit.wikimedia.org/r/251239

Change 250999 merged by jenkins-bot:
Don't make broken wikitext links for sister project links

https://gerrit.wikimedia.org/r/250999