Page MenuHomePhabricator

Duplicate images from other wikis
Closed, ResolvedPublicBUG REPORT

Description

From T402966#11232544:

  1. Go to https://24f17e0e4a.catalyst.wmcloud.org/wiki/London?imageBrowsing=1&useskin=minerva#/imagebrowsing/File:London_Skyline_(125508655).jpeg

Observed: I am seeing a few duplicates in testing.

Screenshot 2025-10-01 at 11.40.28.png (1×990 px, 673 KB)

Expected: No duplicates
(I'll flesh out this ticket with more detail tomorrow)

Requirement

When viewing the Image Browsing feature, no duplicate images from other wikis should appear in the carousel or detail view. The image results must be unique and sorted primarily by frequency of reuse across Wikimedia projects. The feature must correctly merge and order results even when different services (e.g., search relevance and globalusage) return items in varying sequences.

BDD

Feature: Prevent duplicate images and ensure proper sorting in Image Browsing

Scenario: Image Browsing displays unique images across multiple wiki sources
  Given the user opens an article with Image Browsing enabled
  When the carousel loads images from multiple sources (including globalusage)
  Then each image should appear only once

Test Steps

Test Case 1: Verify no duplicate images appear in Image Browsing

  1. Navigate to https://24f17e0e4a.catalyst.wmcloud.org/wiki/London?imageBrowsing=1&useskin=minerva#/imagebrowsing/File:London_Skyline_(125508655).jpeg
  2. Observe the image carousel and detail view
  3. AC1: No duplicate images appear in the carousel or detail view

QA Results - Patchdemo

ACStatusDetails
1T406139#11265354

Event Timeline

Change #1194187 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/extensions/ReaderExperiments@master] Fix fetching media from other wikis

https://gerrit.wikimedia.org/r/1194187

There were some issues where the code was assuming results coming in in releveance order, but that was not the case (and to make things worse, globalusage was coming in in yet another order)
It was not only causing duplicates (caused by the first problem), but also dropped perfectly valid results (caused by the 2nd)

Anwyay, there's a patch that should fix this. And it also changes the sort to be more heavily based on how extensively media is re-used, which is more in line with how we present this feature (and subjectively yields much better results for the articles I tested on)

Change #1194187 merged by jenkins-bot:

[mediawiki/extensions/ReaderExperiments@master] Fix fetching media from other wikis

https://gerrit.wikimedia.org/r/1194187

Edtadros subscribed.

Test Result - Patchdemo

Status: ✅ PASS
Environment: patchdemo (https://24f17e0e4a.catalyst.wmcloud.org)
OS: macOS Sequoia 15.5
Browser: Chrome Canary (latest as of test date)
Device: MS
Emulated Device: NA

Test Case 1: Verify no duplicate images appear in Image Browsing

  1. Navigate to https://24f17e0e4a.catalyst.wmcloud.org/wiki/London?imageBrowsing=1&useskin=minerva#/imagebrowsing/File:London_Skyline_(125508655).jpeg
  2. Observe the image carousel and detail view
  3. AC1: No duplicate images appear in the carousel or detail view

screenshot 109.mov.gif (450×946 px, 3 MB)

screenshot 111.mov.gif (1×710 px, 3 MB)

Note: the patchdemo link in above comment still exhibits the old duplicate behavior, but it's resolved in more instances (e.g. https://97d2d44553.catalyst.wmcloud.org/wiki/London?imageBrowsing=1&useskin=minerva#/imagebrowsing/File:London_Skyline_(125508655).jpeg)

Change #1204725 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Populate more missing channels

https://gerrit.wikimedia.org/r/1204725