Page MenuHomePhabricator

[XL] Create commons-specific elasticsearch query builder that emulates current MediaSearch behaviour
Closed, ResolvedPublic

Description

The new mediasearch prototype on commons uses a new api call to combine various calls to the search api in order to make use of structured and category data when searching

If we can use elasticsearch directly instead of constructing a bunch of separate queries to the search api we'll get better performance, get all of the syntax support already in CirrusSearch, and be able to tune our search results better

This basically involves subclassing CirrusSearch's FullTextQueryBuilder, and then using the registerSearchProfiles hook to load a search profile that'll use the new subclass to use for the File namespace.

Ensure that the new search profile is not used by default - instead we need a url param that loads the new profile so we can A/B test it

There may need to be an initial query to wikidata to get entity ids for search terms in order to search images' structured data (and there may need to be extra data passed to elasticsearch from this query in order to help with scoring, but that's out of scope for this initial ticket)

Acceptance criteria:

  • there exists a url param that will cause searching on commons to use an elasticsearch query that emulates the behaviour of the various search api calls now being used by mediasearch
  • search syntax is supported (including keywords/filters (ie gold intitle:gym), quotes, and booleans) (we will need to implement this piece by piece - see https://phabricator.wikimedia.org/T257304)

How to test:
Do a search only in the File namespace, and add mediasearch=1 to the url. The search results will look the same as normal search, but the actual resultset should be different (and more similar to the search results of mediasearch)

Event Timeline

Cparle created this task.May 13 2020, 4:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 13 2020, 4:13 PM
EBernhardson updated the task description. (Show Details)May 13 2020, 4:16 PM
CBogen updated the task description. (Show Details)May 13 2020, 4:37 PM
Cparle updated the task description. (Show Details)May 13 2020, 4:38 PM
CBogen renamed this task from User elasticsearch directly in Special:MediaSearch on commons to Use elasticsearch directly in Special:MediaSearch on commons.May 13 2020, 4:44 PM
CBogen renamed this task from Use elasticsearch directly in Special:MediaSearch on commons to [XL] Use elasticsearch directly in Special:MediaSearch on commons.Jun 3 2020, 4:25 PM
Cparle renamed this task from [XL] Use elasticsearch directly in Special:MediaSearch on commons to [XL] Create commons-specific elasticsearch query builder that emulates current MediaSearch behaviour.Jun 3 2020, 4:39 PM
Cparle updated the task description. (Show Details)

Change 606187 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] WIP: use WMBI specific search query builder

https://gerrit.wikimedia.org/r/606187

Change 608904 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Use WMBI specific search query builder.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/ /608904

Change 608904 abandoned by Cparle:
[mediawiki/extensions/WikibaseMediaInfo@master] Use WMBI specific search query builder.

Reason:
DUplicate of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/ /606187

https://gerrit.wikimedia.org/r/608904

Cparle updated the task description. (Show Details)Jul 7 2020, 11:33 AM
Cparle moved this task from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Cparle updated the task description. (Show Details)Jul 7 2020, 3:55 PM

Change 606187 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Use WMBI specific search query builder.

https://gerrit.wikimedia.org/r/606187

CBogen added a subscriber: CBogen.

Noting that this can be merged independently of the MediaSearch vue.js port (T251940).

Change 617678 had a related patch set uploaded (by Cparle; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Normalize statements & fulltext scores relative-ish to eachother

https://gerrit.wikimedia.org/r/617678

Change 617677 had a related patch set uploaded (by Cparle; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Make MediaSearch query scoring more similar to Cirrus defaults

https://gerrit.wikimedia.org/r/617677

Change 617677 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Make MediaSearch query scoring more similar to Cirrus defaults

https://gerrit.wikimedia.org/r/617677

Change 619985 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Adjust scoring mechanism for media search

https://gerrit.wikimedia.org/r/619985

Change 617678 abandoned by Cparle:
[mediawiki/extensions/WikibaseMediaInfo@master] Normalize statements & fulltext scores relative-ish to eachother

Reason:
Abandoning in favour of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/ /619985

https://gerrit.wikimedia.org/r/617678

Change 619985 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Adjust scoring mechanism for media search

https://gerrit.wikimedia.org/r/619985

Change 621518 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] [WIP] Normalize statements & fulltext scores relative-ish to eachother

https://gerrit.wikimedia.org/r/621518

matthiasmullie closed this task as Resolved.Sep 16 2020, 2:50 PM