Page MenuHomePhabricator

Add support for using MWAPI to search for Commons media file statement values
Closed, ResolvedPublic

Description

The following query tries to find the categories of an image in a P​18 statement:

# trying to use mwapi to get (license) categories of commons images
# see https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual/MWAPI
SELECT * WHERE {
  hint:Query hint:optimizer "None".
  BIND(wd:Q668563 as ?leinestr) # random item with picture
  ?leinestr wdt:P18 ?picture.

  BIND(STRAFTER(str(?picture), "Special:FilePath/") AS ?filename)
  BIND(CONCAT("File:", ?filename) AS ?file)

  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Categories".
    bd:serviceParam wikibase:endpoint "commons.wikipedia.org".
    bd:serviceParam mwapi:titles ?file.
    ?cat wikibase:apiOutput mwapi:category.
  }
}

(See also T168876 for what happens if the optimizer hint is removed.)

However, it finds no results, because the title from the statement is URL encoded (?file is File:U-Bahn%20Berlin%20Leinestra%C3%9Fe.JPG), but the MediaWiki API expects an unencoded title (File:U-Bahn Berlin Leinestraße.JPG). This particular example can be made to work with some REPLACE()s, but clearly that approach doesn’t scale.

One solution would be for the MWAPI to automatically decode titles in parameters. This could happen, for instance, if the title is a full URI as saved in a Wikidata statement (http://commons.wikimedia.org/wiki/Special:FilePath/…). (We probably don’t want to do this for all values in parameters unconditionally, since they might already be decoded but contain % characters.)

Another solution would be to provide a function for URL-decoding (and probably, for symmetry, another one for URL-encoding). This would be more generally useful, but also potentially harder to discover as a solution.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

One solution would be for the MWAPI to automatically decode titles in parameters.

The API does not know anything about "titles" or URL encoding - it's just strings, "title" is just a name of the parameter. I don't think URL-decoding any string that is passed to service is a good idea. So we'd need to somehow know which ones need to be decoded. Maybe types support can help with it.

Smalyshev claimed this task.

This works now:

SELECT * WHERE {
  hint:Query hint:optimizer "None".
  BIND(wd:Q668563 as ?leinestr) # random item with picture
  ?leinestr wdt:P18 ?picture.

  BIND(STRAFTER(str(?picture), "Special:FilePath/") AS ?filename)
  BIND(wikibase:decodeUri(CONCAT("File:", ?filename)) AS ?file)

  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Categories".
    bd:serviceParam wikibase:endpoint "commons.wikipedia.org".
    bd:serviceParam mwapi:titles ?file.
    ?cat wikibase:apiOutput mwapi:category.
  }
}