Page MenuHomePhabricator

Calculate pollution rate of illustrated infoboxes in article image suggestions
Closed, ResolvedPublic

Description

As a contributor, I don't want to be receiving article-level image suggestions for pages that already have an image. Specifically within an infobox.

Context
In the current when we send article image notifications to experienced contributors, we don't filter out articles with infoboxes (see ticket for explanation). Infoboxes often contain an image. We'd expect those articles to count as illustrated, and therefore not get image suggestions ... however in the SD code images that are used very widely on a wiki are classified as icons, and an icon doesn't count as an illustration.

As such we are sending image suggestions for articles that already have an image within an infobox.

Requirements
In order to decide whether to exclude articles with infoboxes entirely as suggested in https://phabricator.wikimedia.org/T321785, we need to understand the magnitude of the problem — are there a lot of articles with illustrated infoboxes for which we are sending image suggestions?

  • What’s the share of articles with infoboxes in the image suggestion dataset?
  • Are a lot of infoboxes illustrated?

AC

  • Gather the above stats. Approximation is enough as we are interested in the magnitude rather than exact numbers.
  • Share them with product @AUgolnikova-WMF to decide on the next steps

Note that suggesting content for infoboxes is a separate use case and is outside of scope of article image notifications for experienced contributors.

Event Timeline

Ok so for the wikis we send notifications to, id has 40522 articles with suggestions and 15108 of them have infoboxes. None of the others (pt, ru, ca, no, fi, hu) has any values for GEInfoboxTemplates set in https://<wiki>.wikipedia.org/wiki/MediaWiki:GrowthExperimentsConfig.json and so there are no pages-with-infoboxes to exclude

@Cparle thanks Cormac! two questions:

  • For PT wiki: can we roughly estimate how many of those infoboxes have images?
  • For other wikis of focus - is it possible that the absence of the values for GEInfoboxTemplates also mean that we cannot detect if an article has an infobox or not?

Yes - GEInfoboxTemplates is how we detect infoboxes. The idea is the community decides which templates count as infoboxes, add them to MediaWiki:GrowthExperimentsConfig.json, and then we exclude those

For PT wiki: can we roughly estimate how many of those infoboxes have images?

Did you mean ID wiki? For PT we don't know which templates are infoboxes

Upd: there's not really any way of seeing how many infoboxes have images except going through them. We decided to not prioritise this now. This ticekt can be closed and https://phabricator.wikimedia.org/T321785 is put in the backlog