Page MenuHomePhabricator

Automatic SI unit conversion not working on Commons SPARQL engine
Closed, ResolvedPublic8 Estimated Story Points

Description

As an editor I want to query for normalized values for quantity values in order to find all values regardless of the unit they were entered with.

Problem:
On Wikidata we have a configuration that tells the Wikidata Query Service how to convert all kinds of units into a set of standard units. This is useful to be able to query for all Items with a length of X m regardless of it being entered in feet, kilometer or miles for example. On the Commons Query Service this configuration is not set up.

This can be seen when comparing the RDF output on a Wikidata Item (e.g. https://www.wikidata.org/wiki/Special:EntityData/Q513.ttl) for quantity statements (e.g. P2044) to the RDF output on MediaInfo entities (e.g. https://commons.wikimedia.org/wiki/Special:EntityData/M101311668.ttl). The latter is missing psn:P2044 triples.

BDD
GIVEN a query to the Commons Query Service
WHEN querying for quantity values
THEN normalized converted values are available where we have a conversion factor from the original value

Acceptance criteria:

  • RDF output configuration for Wikimedia Commons includes the appropriate unit conversion configuration (WMDE's job)
  • Commons Query Service has been configured to index normalized values to take unit conversion into account in the same way that the Wikidata Query Service does. (done by the WMF once WMDE has adjusted Wikimedia Commons Wikibase configuration)

Notes:

Original report:

On Wikidata when doing a query it's possible to do automatic unit conversion to SI units. So someone might have entered height in some funky unit like feet or nautical miles, but you can still query for it in meters. Conversion table at https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/refs/heads/master/wmf-config/unitConversionConfig.json

See for example on Wikidata https://w.wiki/3Fmq . Similar query on Commons returns empty fields, see https://tinyurl.com/yyfbsvgw (fixed) so looks like the automatic SI unit conversion is not yet working for Commons. Please enable/fix it.

Event Timeline

This is indeed only set on wikidatawiki currently

https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/5f39c6b71bc2944423e3d4251286ce6d98961988/wmf-config/Wikibase.php#178

I suspect that also loading this on commonswiki will solve the issue in the RDF output?
Then a fresh dump and reload would be needed for the WCQS.

Note: We don't have this in testwikidata or testcommons, (maybe not beta either), should we?

WMDE will attempt using the config bit from Wikidata for Commons and see if that fixes the issue. If it does not, the team will regroup and consider other options.

The config is already there:

ladsgroup@mwmaint1002:~$ mwscript eval.php --wiki=commonswiki
> var_dump( $wgWBRepoSettings['unitStorage'] );
array(2) {
  ["class"]=>
  string(35) "\Wikibase\Lib\Units\JsonUnitStorage"
  ["args"]=>
  array(1) {
    [0]=>
    string(51) "/srv/mediawiki/wmf-config/unitConversionConfig.json"
  }
}

I do see the triples there too:

sdcs:M101311668-FB2F8781-5BAB-4222-ADD5-54DC1CC8EECD a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P2044 "+211"^^xsd:decimal ;
	psv:P2044 sdcv:6c04f11c859b64a130337920d1694c49 .

Am I missing something obvious?

I do see the triples there too:

sdcs:M101311668-FB2F8781-5BAB-4222-ADD5-54DC1CC8EECD a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P2044 "+211"^^xsd:decimal ;
	psv:P2044 sdcv:6c04f11c859b64a130337920d1694c49 .

Am I missing something obvious?

I don’t see any psn: triples in that TTL snippet. (psv: are regular full values, many statements have those independent of normalized units.)

Change 695547 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@master] repo: Use Item's concept URI for prefix of unit converter

https://gerrit.wikimedia.org/r/695547

We need to confirm that after it being deployed, an edit on a mediainfo updates these values and enable us to at least query for that mediainfo, once that's determined, search platform team needs to reload rdf triples for wcqs.

Change 695547 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] repo: Use Item's concept URI for prefix of unit converter

https://gerrit.wikimedia.org/r/695547

Addshore updated the task description. (Show Details)
Addshore moved this task from Product Realm to External Realm on the [DEPRECATED] wdwb-tech board.
Addshore added a subscriber: Ladsgroup.
Addshore added a subscriber: Gehel.

@CBogen @Gehel I think this is ready for you now!

Gehel moved this task from incoming to in progress on the Wikidata board.
Gehel moved this task from Watching / Waiting to Incoming on the Wikidata-Query-Service board.
dcausse updated the task description. (Show Details)
dcausse subscribed.

I think it's working properly now