Page MenuHomePhabricator

[L] Exclude certain sections from having generated image suggestions
Closed, ResolvedPublic

Description

As a contributor, I want to ensure I do not receive image suggestions for sections that should not have an image.

Requirements
Based on section level image suggestions requirements from Growth, exclude the following sections from generating image suggestions:

  • Sections with an infobox (I think this is covered by "sections where an image already exists" below)
  • References sections
  • External links
  • Further reading
  • Sections where an image already exists
  • Lead section

TBA:

  • Do we suggest images for infoboxes in the datapipeline?
  • Do we suggest images for lead section in the data pipeline? No

(if we do, then we can suggest images and then Growth can filter them out later on, similar to article img suggestions)

Note original items below removed. We don't suggest images for empty sections, and a section with a gallery is already excluded by the code that excludes sections with images
Sections without content/empty sections
Sections with a gallery

AC

  • Users don't get image suggestions for these sections
  • We include this in the algorithm/data pipeline documentation about image suggestion generation

Out of scope: Excluding suggestions based on custom community configuration (like excluding articles with certain categories or templates). Can be done on the frontend. It's unrelated to the API, specific to the Growth use case, and easy to implement within the frontend's search query construction logic.

Event Timeline

AUgolnikova-WMF updated the task description. (Show Details)
AUgolnikova-WMF updated the task description. (Show Details)

Note: We could utilise the existing "image-recommendation" "excludedCategories" and "excludedTemplates" data from the NewcomerTasks https://cs.wikipedia.org/wiki/MediaWiki:NewcomerTasks.jsoncommunity configuration

Note that in MediaWiki:NewcomerTasks.json the field is "excludedSections" (https://cs.wikipedia.org/wiki/MediaWiki:NewcomerTasks.json for an example), and this is a manually curated list of sections that links should not be recommended in. In practice, this is the "References" and "Citations" sections. Using that information on wikis where it is available is probably going to work pretty well.

AUgolnikova-WMF updated the task description. (Show Details)
AUgolnikova-WMF updated the task description. (Show Details)

As a follow-on from @kostajh 's comment above ... I think this data is just what we need - if a section should be excluded from getting link recommendations it's probably a safe bet to assume it shouldn't have images either. Perhaps we could grab the data from there via an api call for each relevant wiki at the start of the data pipeline? It'd be easy to grab the json from a call like this, and parse it to get the sections we want to exclude

Out of scope: Excluding suggestions based on custom community configuration (like excluding articles with certain categories or templates). Can be done on the frontend. It's unrelated to the API, specific to the Growth use case, and easy to implement within the frontend's search query construction logic.

I don't understand why this is out of scope. If there's community config it's not that hard to grab it, it'd be a definitive list for each wiki and we avoid spending time on coming up with our own lists

Discussed with @AUgolnikova-WMF and we agreed that what we have already is adequate for this stage of the project, and we can revisit the community config after the MVP stage

CBogen renamed this task from Exclude certain sections from having generated image suggestions to [L] Exclude certain sections from having generated image suggestions.Dec 14 2022, 5:39 PM