Page MenuHomePhabricator

Special:EntityData returns localized <wdata:> url, but Blazegraph updater requires it on english
Open, LowPublic

Description

As you know, Blazegraph updater makes request to Special:EntityData and gets data of entity in RDF/Turtle formats.
One of prefixes is wdata which looks like:

@prefix wdata: <https://www.wikidata.org/wiki/Special:EntityData/> .

And Blazegraph updater expects it in this form (link to source):

public WikibaseUris(String host) {
        root = "http://" + host;
        rootHttps = "https://" + host;
        entityData = root + "/wiki/Special:EntityData/";
        entityDataHttps = rootHttps + "/wiki/Special:EntityData/";
....

Problem is: Mediawiki core (Title->getCanonicalURL()) makes all links in content language which defined in $wgLanguageCode, and on site when (for example) $wgLanguageCode defined to 'ru', wdata prefix looks like:

@prefix wdata: <https://www.example.com/wiki/%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:EntityData/> .

Point to WikibaseRepo.php source where the link is made ($entityDataTitle->getCanonicalURL()).

$entityDataTitle = Title::makeTitle( NS_SPECIAL, 'EntityData' );

$this->rdfVocabulary = new RdfVocabulary(
				$this->getVocabularyBaseUri(),
				$entityDataTitle->getCanonicalURL() . '/',
				$languageCodes,
				$this->dataTypeDefinitions->getRdfTypeUris()
			);

I think Special:EntityData should use English content language for making links, but I can't find easy and true way to do it.
Seems that this behavior is hard-coded in core and cannot be easy changed.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

Yes, Special:EntityData should produce the same data always. I'm not sure why exactly the wikibase has different language code - isn't wikibase supposed to be multilingual? But I'll look into how to make it use more standard URL...

I'm not sure why exactly the wikibase has different language code

Because wikibase uses $entityDataTitle->getCanonicalURL() function which returns localized URL. It's common behavior of MW Core that hard-coded and there is no alternative :-(

isn't wikibase supposed to be multilingual?

I sure wikibase should be multilingual, but URL in RDF schema doesn't.

There is parameter uselang, but it changes all messages except URL, because URL is hard-coded. :-/

I temporary fixed it by adding in LocalSetting.php string:

# Site language code, should be one of the list in ./languages/data/Names.php
if ( $_SERVER['REMOTE_ADDR'] === 'myserverip' ) { #### Hack Wikidata Query Service Updater
    $wgLanguageCode = "en";
} else {
    $wgLanguageCode = "ru";
}
Smalyshev changed the task status from Open to Stalled.Dec 21 2017, 2:14 AM
Smalyshev triaged this task as Low priority.
Aklapper changed the task status from Stalled to Open.Nov 2 2020, 6:11 PM

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status, as tasks should not be stalled (and then potentially forgotten) for years for unclear reasons.

(Smallprint, as general orientation for task management:
If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead.
If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks.
If this task is stalled on an upstream project, then the Upstream tag should be added.
If this task requires info from the task reporter, then there should be instructions which info is needed.
If this task needs retesting, then the TestMe tag should be added.
If this task is out of scope and nobody should ever work on this, or nobody else managed to reproduce the situation described here, then it should have the "Declined" status.
If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)

At first glance, to me, it seems like the root of the problem might be us getting the RDF data from a SpecialPage rather than an api. Naturally special pages (mostly meant for consumption by users) have their addresses localised. Probably code that is meant to be read almost solely by machines should use an API. Probably making this change just for this ticket wouldn't make sense but might be worth keeping in mind.

From mattermost

Do you have any ideas on how much work it would involve to fix it?
Adam Shorland
11:21 AM

I dont think it would be much
Somewhere wee just need to make it always use Special: not a licalized one

for rdf output
Thomas Arrow
11:23 AM

probably a little bit of work
11:24 AM

AFAIK \Wikibase\DataAccess\MediaWiki\EntitySourceDocumentUrlProvider::getCanonicalDocumentsUrls is where the localised URL "start"
11:26 AM

I think that TitleFactory is what is returning the localised ones; could be replaced with something else that always gives "Special"

Change 761711 had a related patch set uploaded (by Addshore; author: Addshore):

[mediawiki/extensions/Wikibase@master] Always output document urls with canonical namespace

https://gerrit.wikimedia.org/r/761711