Page MenuHomePhabricator

Concept mapping for Google Cloud Vision -> Wikidata
Closed, ResolvedPublic

Description

In order to (potentially) use Google Cloud Vision as an image labeling provider, we need a concept mapping from its internal entity IDs to Wikidata IDs.

The GCV labeling API returns an AnnotateImageResponse with each entity represented by an EntityAnnotation object. According to the spec, entity IDs (mids) are opaque, but in practice these are Freebase IDs. (That said, there's technically no guarantee that they will remain stable.)

A Freebase->Wikidata mapping from 2013 is provided by Google. We need to create a script to identify any missing Wikidata IDs in the mapping and investigate what's happened to them. Once we're confident in the mapping, we can store the data in a MySQL table for use in MediaWiki.

Deadline
Aug 30

Event Timeline

Mholloway created this task.Aug 2 2019, 4:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 2 2019, 4:59 PM
Mholloway updated the task description. (Show Details)Aug 2 2019, 5:01 PM

Per @Ramsey-WMF the mid values from the Cloud Vision API often (but not always) actually match the Freebase ID for the entity.

Freebase data dumps, including Freebase-Wikidata mappings, are available here: https://developers.google.com/freebase/

Let's hold this one pending further discussions.

Mholloway triaged this task as High priority.Aug 15 2019, 8:33 PM
Mholloway renamed this task from Create concept mappings for Google Cloud Vision <-> Wikidata to Concept mapping for Google Cloud Vision -> Wikidata.Aug 19 2019, 3:37 PM
Mholloway updated the task description. (Show Details)

Change 531309 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] Add Freebase->Wikidata ID conversion table

https://gerrit.wikimedia.org/r/531309

Change 531309 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Add Freebase->Wikidata ID conversion table

https://gerrit.wikimedia.org/r/531309

Mholloway moved this task from In development to Done on the MachineVision board.Aug 23 2019, 5:34 PM
Mholloway closed this task as Resolved.Aug 23 2019, 5:39 PM