Page MenuHomePhabricator

EPIC: Investigate adding aliases from Wikidata into the search index so that they can enhance the results and reduce the zero results rate
Closed, DeclinedPublic

Event Timeline

Deskana created this task.Sep 22 2015, 5:06 PM
Deskana raised the priority of this task from to Medium.
Deskana updated the task description. (Show Details)
Deskana added a subscriber: Deskana.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 22 2015, 5:06 PM
dcausse added a subscriber: dcausse.Oct 9 2015, 8:30 AM

According to T114867 this does not seem to be worth a try.

We should maybe look at another solution. Issuing a query against wikidata seems to be a better idea.
This feature is enabled on itwiki, this would cover the following use case :

  • query with an alias missing in the wiki index
  • query with a name in a different language
  • query with small transliteration variations

Problem with the current wikidata integration is that it's not optimal, results from wikidata should be integrated in a seamless manner into the search results. Moreover if the wikidata entity has a link to user wiki.
This involves :

  • UI design/work
  • maybe some backend work (performance considerations: I'm not sure that our cluster will support doubling the number of queries)

After having a look at how we index wikidata content it looks like it's not very optimal. Structured data seems to be flattened to the content field (https://www.wikidata.org/wiki/Q216092?action=cirrusDump). I've heard that the wikidata team plans to use cirrus prefixsearch for wbsearchentities. We could maybe start thinking on what could be a good mapping for wikidata docs?

I think this is related to T89733.

ksmith moved this task from On Sprint Board to Search on the Discovery board.
ksmith moved this task from Search to Product Epics on the Discovery board.
Deskana closed this task as Declined.Dec 23 2015, 5:24 AM
Deskana claimed this task.

The analysis performed in T114867: Evaluate the benefits of adding wikidata aliases to cirrus indices indicated that this avenue is not worth pursuing at present. Therefore, I am declining this epic. It can be reopened if we decide to pursue this in the future.