Page MenuHomePhabricator

Image browsing: ensure exclude tags are respected
Closed, ResolvedPublic2 Estimated Story Points

Description

Background

We would like to make sure we are not including any images in the carousel that editors are purposefully excluding.

Acceptance criteria

  • Ensure that mediaviewer exclude tags are respected in the carousel and new ToC image experience
  • Document findings

Event Timeline

I believe @bvibber is already checking for the common CSS classes used to exclude images from MultimediaViewer, but would be good to know if there's anything else we should look out for.

HSwan-WMF set the point value for this task to 2.Aug 19 2025, 5:27 PM
HSwan-WMF moved this task from Needs Refinement to Sprint 4 on the Reader Growth Team board.

We can also consider excluding some images from the end of the article. Based on the Wikipedia article template and manual (https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Layout#%22See_also%22_section), it would be safe to exclude images from these sections found at the end of the article:

  • See Also
  • Notes
  • References
  • Further reading
  • External links

Note: If you scroll to the end of the Carousel or VisualTableOfContents, you'll notice some odd images being included (most likely from the sections mentioned).

Example from Paris article:
{F65794503}

mfossati changed the task status from Open to In Progress.Aug 21 2025, 2:48 PM
mfossati claimed this task.
mfossati moved this task from Committed to Doing on the Reader Growth Team (Sprint 4 ) board.

Per @Jdlrobson's suggestion, let's look at what the PageImages does to see if we can incorporate any of that logic.

Change #1182198 had a related patch set uploaded (by Marco Fossati; author: Marco Fossati):

[mediawiki/extensions/ReaderExperiments@master] ImageBrowsing: update image file extension allow-list

https://gerrit.wikimedia.org/r/1182198

I've submitted a small patch that updates the allowlist of image file extensions.

The thumbExtractor already filters out the following images:

  • class="metadata" or class="noviewer", which should exclude most navigational icons
  • SVG images that appear in an infobox, as they're often flags and small maps we don't want to show in the photo carousel

Related resources

  • The global image denylist should prevent the transclusion of its images. I assume this is a no-op here, as it's already done by MediaWiki: see https://en.wikipedia.org/wiki/MediaWiki:Bad_image_list & https://en.wikipedia.org/wiki/MediaWiki_talk:Bad_image_list
  • The PageImages extension enables community-curated denylists that can be edited by administrators in any wiki. From a random check of a few large Wikipedias, English only has 10 images, French has five, Spanish one, Chinese five. Integrating these denylists here would require reading them from each Wikipedia, which seems out of scope
  • PageImages determines the best possible page image on Wikimedia wikis as follows: one of the first four images in an article, with a width/height between 400-600px and a height/width twice the value of the other dimension. This is about inclusion, not exclusion, also out of this task's scope.

We can also consider excluding some images from the end of the article.
Note: If you scroll to the end of the Carousel or VisualTableOfContents, you'll notice some odd images being included (most likely from the sections mentioned).

From the articles I've seen so far, I think that thumbExtractor does a a good job filtering those images. That said, there might be leaks due to exotic use of templates, like Paris' last carousel image, which is a PNG icon.
Excluding those sections in all languages isn't a trivial task, as it requires alignment pairs, e.g., References in English aligns to Note in Italian.I think we can live with the edge cases that thumbExtractor doesn't handle.

Change #1182198 merged by jenkins-bot:

[mediawiki/extensions/ReaderExperiments@master] ImageBrowsing: update image file extension allow-list

https://gerrit.wikimedia.org/r/1182198

There was some talk about excluding images from tables (from things like the Clade template for example) – @bvibber @lwatson thoughts?

There was some talk about excluding images from tables (from things like the Clade template for example) – @bvibber @lwatson thoughts?

The variety of wikitext templates that render tables or lists is so broad that handling such a huge tail is far from being trivial and exhaustive. Every Wikipedia can have specific ways to display content.
Anyway, I believe that our goal is to show as many informative images as possible, thus focusing on recall rather than precision. Implementing ad-hoc filters for every wikitext template isn’t realistic and would improve precision at the cost of recall.

I excluded images from Clade templates only in the patch, not all table images. An example is this cladogram found https://en.wikipedia.org/wiki/Zebra#Taxonomy

Zebra_cladogram.png (477×436 px, 65 KB)

@mfossati / @lwatson / @bvibber do you all feel ok signing this off? Looks done to me.

Happy to close if there are no objections.

mfossati updated the task description. (Show Details)

We can also consider excluding some images from the end of the article. Based on the Wikipedia article template and manual (https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Layout#%22See_also%22_section), it would be safe to exclude images from these sections found at the end of the article:

  • See Also
  • Notes
  • References
  • Further reading
  • External links

Note: If you scroll to the end of the Carousel or VisualTableOfContents, you'll notice some odd images being included (most likely from the sections mentioned).

Example from Paris article:
{F65794503}

Resolved in T406991: Bug bash 3 by excluding navbox images and small images from image browsing results.