Page MenuHomePhabricator

[S] Image algorithm shouldn't suggest disambiguation pages
Closed, ResolvedPublicBUG REPORT

Description

The image algorithm shouldn’t suggest disambiguation pages. We've updated Add a Link to avoid disambiguation pages T279128 , and we should make a similar update for Add an Image.

  • Example of an Add an Image suggested edit on a disambiguation page on arwiki. Reported by @Dyolf77_WMF

Steps to replicate the issue:

  • As a newcomer viewing Suggested Edits
  • When I view Add an Image suggestions
  • Then I'm sometimes given suggestions to add an image to a disambiguation page

What should have happened instead?
Growth tools should never suggest adding an image to a disambiguation page.

Event Timeline

kostajh subscribed.

Do we want to exclude disambiguation pages at the data pipeline level? If so, tagging Structured-Data-Backlog. If no one has a use case for needing image suggestions for disambiguation pages, then I think that's where the fix should be made.

If there is a use case for images on disambiguation pages and we just want to hide those on Special:Homepage, then we can implement our own change in GrowthExperiments. I don't believe we are able to add a search query to filter articles based on the 'disambiguation' page property, but after we have the TaskSet, we could potentially check the disambiguation property just before showing results to the user.

Do we want to exclude disambiguation pages at the data pipeline level? If so, tagging Structured-Data-Backlog. If no one has a use case for needing image suggestions for disambiguation pages, then I think that's where the fix should be made.

If there is a use case for images on disambiguation pages and we just want to hide those on Special:Homepage, then we can implement our own change in GrowthExperiments. I don't believe we are able to add a search query to filter articles based on the 'disambiguation' page property, but after we have the TaskSet, we could potentially check the disambiguation property just before showing results to the user.

My understanding is that disambiguation pages are already excluded at the data pipeline level, so this sounds like either a regression, or that some disambiguation pages are still slipping through for some reason. @Cparle or @mfossati can you take a quick look?

We exclude any pages that are instances of disambiguation pages in wikidata when we're compiling the data. The page in the example above is an instance of a 'wikimedia human name disambiguation page' (Q22808320) rather than a plain old disambiguation page, so we missed it

We can just add Q22808320 to the list of exclusions? This would be very quick but probably won't fix everything, because it depends on wikidata coverage.

@Tgr 's patch for T279128 excludes disambig pages based on the page_props table rather than wikidata, which would take more work but would be more comprehensive

I wonder how the user ends up on disambiguation pages? In theory ElasticSearch shouldn't return those.

We can just add Q22808320 to the list of exclusions?

We're going to go ahead with this approach, and if it's still a problem we can try a different way to solve it

CBogen renamed this task from Image algorithm shouldn't suggest disambiguation pages to [S] Image algorithm shouldn't suggest disambiguation pages.Sep 21 2022, 4:36 PM

@Cparle please let us (Growth) know if you need anything, otherwise we'll just keep an eye on this task. Thanks!

Merged & deployed, closing.