Page MenuHomePhabricator

[M] [wmf.27] MediaSearch - duplicate files displayed in search results
Closed, ResolvedPublic

Description

  1. Perform a search for hamar daban(type or paste the search terms) in Images. When results are displayed - select Medium Image size.

Warning - just pasting the url won't show the bug - https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=hamar+daban&imageSize=500%2C1000

  1. One image (https://commons.wikimedia.org/wiki/File:Bockk%C3%A4fer_Murino.jpg) will be displayed but there will be "No more results found" warning.
  2. Click on the image to see it in the QuickView - the image correctly displayed. Close the QuickView - there will be two identical images displayed.
  3. Click any of the two images to open in QuickView - both images will display the blue outline indicating that they are selected for QuickView. Close the QuickView - there will be three identical images displayed.

Reloading the page will display one image and the warning "No more results found" and the issue won't be reproducible anymore.

The gif below illustrates the following steps

  • entering hamar daban search terms
  • selecting the bug photo for the QuickView (this step is not necessary)
  • selecting a filter Image size - Medium
  • one image is displayed - selecting it for the QuickView triggers duplicate files display

Event Timeline

CBogen renamed this task from [wmf.27] MediaSearch - duplicate files displayed in search results to [M] [wmf.27] MediaSearch - duplicate files displayed in search results.Feb 9 2021, 5:40 PM

Moriel found what might be another instance of this in a larger results set. Query: https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=puppy

Result:

I think we have 2 separate problems here:

Elena's original observations is probably caused by the combination of the intersection observer kicking in (after closing quickview reshuffles the contents of the page), and the API call offset param falling back to 0 if it has a falsy value (in this cases, it's null, because there was no offset)

Moriel's observation is probably caused by inconsistencies in search scoring across multiple replicas: scores will have minor deviations across queries, which make it possible for a certain result to rise above or fall below another if their scores are extremely similar. It would probably be best to keep track of what titles are already being rendered, and simply ignore duplicate results coming in.

Change 667125 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Prevent searches that are already known to have no results

https://gerrit.wikimedia.org/r/667125

Change 667126 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Omit duplicate results

https://gerrit.wikimedia.org/r/667126

Change 667125 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Prevent searches that are already known to have no results

https://gerrit.wikimedia.org/r/667125

Change 667126 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Omit duplicate results

https://gerrit.wikimedia.org/r/667126