Page MenuHomePhabricator

Unable to find a file by filename while adding a Commons media file statement
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

The file cannot be found.

Screenshot of the results:

Bildschirmfoto_2023-12-19_02-47-56.png (532×665 px, 80 KB)

What should have happened instead?:

The file should be the first result, because the name is an exact match.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Searching for the filename with the namespace prefix (File:Es-us-espanol.ogg) doesn't work either, nor does entering the full URL (https://commons.wikimedia.org/wiki/File:Es-us-espanol.ogg)

Event Timeline

Gehel set the point value for this task to 5.Jan 8 2024, 4:36 PM
dcausse added subscribers: Cparle, dcausse.

Selecting only namespace=6 does trigger the MediaSearch query profile which does not include the all_near_match field which is the one helping the most to rank almost perfect title matches to the top.
I believe that the fix would be to fix the MediaSearch query builder to include an optional clause on the all_near_match field.
@Cparle do you remember if not including all_near_match was done on purpose and if it would break any existing usecases to add it?

Don't remember for sure, but it seems unlikely that we excluded it on purpose. @matthiasmullie do you remember? Shouldn't be a huge deal to add it anyway

Don't remember for sure, but it seems unlikely that we excluded it on purpose. @matthiasmullie do you remember? Shouldn't be a huge deal to add it anyway

Same; I don't remember this having been excluded on purpose.

Change 995195 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Add NearMatchFieldQueryBuilder

https://gerrit.wikimedia.org/r/995195

Change 995219 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/WikibaseMediaInfo@master] search: add a nearmatch clause to help with title matches

https://gerrit.wikimedia.org/r/995219

Change 995195 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add NearMatchFieldQueryBuilder

https://gerrit.wikimedia.org/r/995195

Change 995219 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] search: add a nearmatch clause to help with title matches

https://gerrit.wikimedia.org/r/995219

The new builder moved the result to #4 which is better but still not enough and it's beaten by 3 other images because other criteria:

  • weighted_tags:image.linked.from.wikipedia.lead_image/Q458
  • statement_keywords:p180=q458

Moving back to in-progress to fine-tune the weight (probably bumping from 3.5 to 10).

Change 1008901 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/WikibaseMediaInfo@master] Move the logistic to the top-level and fix boot_mode

https://gerrit.wikimedia.org/r/1008901

changed the layout of the query a bit by moving the logistic function introduced in T271799 to the top-level so that it wraps the new nearmatch clause

Change #1008901 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Fix logistic boot_mode and increase default near match weight

https://gerrit.wikimedia.org/r/1008901