Page MenuHomePhabricator

[L] Determine an IRI to join commons mediainfo entities and wikidata properties referencing commons images
Open, Needs TriagePublic

Description

As a user of WDQS and WCQS I want to be able to join a mediainfo item with a property value referencing a commons image such as P18.

There are no obvious ways to do this currently.

In the RDF output wikidata the properties instance of Q18610173 (e.g. P18) do use an IRI in the form:
http://commons.wikimedia.org/wiki/Special:FilePath/_filename_ while commons entities reference their contentUrl using https://upload.wikimedia.org/wikipedia/commons/X/XY/_filename.

These IRIs could have been used for joining but they are different:

  • the use of Special:FilePath
  • the use of http vs https

There should exist a common IRI identifying a commons file.

With a commons RDF output like:

sdc:M10031710 a wikibase:Mediainfo,
		schema:MediaObject,
		schema:ImageObject ;
	schema:encodingFormat "image/jpeg" ;
	schema:contentUrl <https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg> ;

One approach could be to change the URL emitted by commons mediainfo to be the same as the one by wikidata. This is a breaking approach:

sdc:M10031710 a wikibase:Mediainfo,
		schema:MediaObject,
		schema:ImageObject ;
	schema:encodingFormat "image/jpeg" ;
	schema:contentUrl <http://commons.wikimedia.org/wiki/Special:FilePath/Douglas%20adams%20portrait%20cropped.jpg> ;

Another (preferred) approach would to introduce a new triple, e.g. schema:url. This would increase the size of the graph, but by an acceptable amount, and without adding any breaking changes:

sdc:M10031710 a wikibase:Mediainfo,
		schema:MediaObject,
		schema:ImageObject ;
	schema:encodingFormat "image/jpeg" ;
	schema:contentUrl <https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg> ;
	schema:url <http://commons.wikimedia.org/wiki/Special:FilePath/Douglas%20adams%20portrait%20cropped.jpg> ;

Note on similar tickets:

  • T258769 is very similar but is worded to simplify the use of the image grid feature of the UI
  • T258776 serves similar purposes but is harder to achieve:
    • it requires "synchronization" between wikidata and commons to obtain the page ID of a media info item
    • MediaInfo item may not yet exist while the commons image is still referenceable from wikidata

AC:

  • A query on WCQS for joining wikidata item from WDQS using federation can be easily written without complex string manipulation
  • A new schema:url triple is added

Event Timeline

File names are bad URI's. Files get renamed all the time (see https://commons.wikimedia.org/w/index.php?title=Special:Log&offset=&limit=500&type=move ) causing all sorts of breakage. The pageid stays the same so the mediaid also stays the same. That's a much more stable identifier.

CBogen renamed this task from Determine an IRI to join commons mediainfo entities and wikidata properties referencing commons images to [L] Determine an IRI to join commons mediainfo entities and wikidata properties referencing commons images.Apr 7 2021, 4:43 PM