Page MenuHomePhabricator

Querying multiple entity labels in one API call
Open, Needs TriagePublic

Description

I developed a tool called Multisearch that allows you to input a list of titles (50 max) with a wiki project and get back a link to the page (if it exists) and the QID of the Wikidata item:

https://tools.wmflabs.org/hay/multisearch/

This tool is useful when you're matching datasets and you want to quickly get pages and QID's for a list of titles (think street names, personal names, cities, etc.)

Unfortunately for Wikidata it doesn't quite work. The API call i use doesn't work for Wikidata because entities are not pages. I think there are probably two API methods, but both are not sufficient. wbsearchentities can only query one thing at a time. wbgetentities has an option to query multiple labels, but you need to specify sites, which makes it effectively the same as querying a Wikipedia instance (and hence, not useful).

So basically what i need is an API method to query multiple labels and get back if the label exists (and what its QID is).

Event Timeline

Husky created this task.Apr 30 2018, 9:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 30 2018, 9:17 PM

You forgot to mention action=query&prop=pageterms but...

you need to specify sites

No, you only need to specify ids=.

Husky added a comment.May 1 2018, 7:28 PM

@matej_suchanek both of these options are not useful, i want to query by label, not by ID.

Husky added a comment.May 1 2018, 8:25 PM

@matej_suchanek i've tried that, but unfortunately searching by label using SPARQL is far too slow. Queries time out or take forever.

Hey @Husky, we've been discussing about it with the team, but we don't really understand what you need.
Can you provide a concrete example of what you want to achieve, and what would be the result you expect?

Husky added a comment.May 20 2018, 1:49 PM

@Lea_Lacroix_WMDE what i basically want is an option to use the wbgetentities method with an option to query by label in the same way you can query ids or titles. I was working on matching a set of street names to WD items, i could of course do an individual query for each street using wbsearchentities, but then i can only query one item at a time. So basically i could imagine something like:

https://www.wikidata.org/w/api.php?action=wbgetentities&labels=Berlin|Amsterdam|Venice&languages=en|nl

And then get back a similar result that you get when using the titles or ids parameter. It's fine if the matching is strict (so it won't return "Berlin Alexanderplatz" as well). So something like this when i would to the query mentioned above:

{
  "entities": {
    "Q64": {
      "id": "Q64",
      "labels": {
        "en": {
          "language": "en",
          "value": "Berlin"
        },
        "nl": {
          "language": "nl",
          "value": "Berlijn"
        }        
      },
      "type": "item"
    },
    "Q641": {
      "id": "Q641",
      "labels": {
        "en": {
          "language": "en",
          "value": "Venice"
        },
        "nl": {
          "language": "nl",
          "value": "Venetië"
        },        
      },
      "type": "item"
    },
    "Q727": {
      "id": "Q727",
      "labels": {
        "nl": {
          "language": "nl",
          "value": "Amsterdam"
        },
        "en": {
          "language": "en",
          "value": "Amsterdam"
        }        
      },
      "type": "item"
    }
  },
  "success": 1
}
Vvjjkkii renamed this task from Querying multiple entity labels in one API call to xxdaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Mainframe98 renamed this task from xxdaaaaaaa to Querying multiple entity labels in one API call.Jul 1 2018, 7:37 AM
Mainframe98 raised the priority of this task from High to Needs Triage.
Mainframe98 updated the task description. (Show Details)
Mainframe98 added a subscriber: Aklapper.