Page MenuHomePhabricator

[EPIC] MediaSearch should use a dedicated service/query for doing its concept-lookup instead of the wikidata search API
Open, HighPublic

Description

As CirrusSearch maintainer I want MediaSearch to use a dedicated dataset built from wikidata that does not rely on the existing wikidata search APIs so that I can improve one without impacting the other.

Sub-tickets will be created as needed but the plan is roughly:

  • import commons mediainfo dump to hdfs
  • spark job that joins commons & wikidata and output a dedicated dataset for concept lookups
  • determine the mapping, possibly experimenting with better techniques (not one field per language) to support multiple languages
  • custom elasticsearch query to do query expansion&rewrite
  • adapt mediasearch and replace the wikidata search API using query expansion
  • optional but would be good to have: provide completion for wikidata items using this same dataset instead of using the wikidata completion API

AC:

  • The MediaSearch query builder is no longer using the wikidata search API
  • A single request is made to elastic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
EBernhardson moved this task from needs triage to [epic] on the Discovery-Search board.
TJones renamed this task from MediaSearch should use a dedicated service/query for doing its concept-lookup instead of the wikidata search API to [EPIC] MediaSearch should use a dedicated service/query for doing its concept-lookup instead of the wikidata search API.Nov 30 2020, 4:28 PM
TJones added a project: Epic.
CBogen raised the priority of this task from Medium to High.Dec 10 2020, 3:08 PM