Page MenuHomePhabricator

Add Image: Allow excluding templates/categories per task type
Closed, ResolvedPublic

Description

The on-wiki configuration for suggested edit task types includes a list of excluded categories and a list of excluded templates, but we'll want to exclude articles with infoboxes from Add Image tasks but not other type of tasks, so we need per-task-type functionality instead.

This might affect the Special:EditGrowthConfig form generation logic as well.

Event Timeline

kostajh triaged this task as Medium priority.Aug 25 2021, 7:04 AM

Change 714696 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] [WIP] NewcomerTasks: Allow excluded templates/categories per task type

https://gerrit.wikimedia.org/r/714696

Change 714696 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] NewcomerTasks: Excluded templates / categories per task type

https://gerrit.wikimedia.org/r/714696

Change 723053 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Improvements to task configuration handling

https://gerrit.wikimedia.org/r/723053

Change 723053 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Improvements to task configuration handling

https://gerrit.wikimedia.org/r/723053

@Tgr - it seems that on enwiki betalabs it's surprisingly easy to encounter the following cases

(1) Two issues

  • an article has an infobox and, yet, it's included into the add image pool
  • a suggested image is the same as in the infobox
testwiki wmf.7enwiki betalabs
Screen Shot 2021-11-03 at 10.44.54 AM.png (1×764 px, 273 KB)
Screen Shot 2021-11-09 at 3.40.28 PM.png (862×287 px, 106 KB)
Screen Shot 2021-11-09 at 4.06.29 PM.png (787×411 px, 100 KB)

(2) A disambiguation article is included into the Add image feed

Screen Shot 2021-11-09 at 3.54.44 PM.png (799×417 px, 131 KB)
Screen Shot 2021-11-09 at 4.04.16 PM.png (796×407 px, 87 KB)

There is another group of articles - e.g. https://en.wikipedia.beta.wmflabs.org/wiki/Miloslav - that are about a name; probably it's not worth to include such articles into the add image pool, but it's questionable and outside of the scope of this iteration.

We didn' have infobox filtering back when content was imported to beta and testwiki so it's not set up there. (I could set it up now but then we'd probably be left with very little test content.) Infobox filtering should work on the production wikis though. (A few infoboxes might slip through there, because it's hard to come up with an accurate list of infoboxes, but it should be rare.)

Whether we want to exclude disambiguation pages (or rather whether we want Research to exclude it) should be a question for @MMiller_WMF.

Whether we want to exclude disambiguation pages (or rather whether we want Research to exclude it) should be a question for @MMiller_WMF.

Disambiguation pages actually should be excluded per T276137: Exclude unillustrated articles that should not have images. If we see them in a production wiki, that's a bug.

Disambiguation pages actually should be excluded per T276137: Exclude unillustrated articles that should not have images. If we see them in a production wiki, that's a bug.

The page Elena mentioned, Alex_Moffat (recommendation), is definitely a disambiguation page (and has been for a long time).

The page Elena mentioned, Alex_Moffat (recommendation), is definitely a disambiguation page (and has been for a long time).

This is probably due to the algorithm using Wikidata ("instance of: Wikipedia disambiguaion page") which is unreliable. E.g. in this case, Wikidata has two Alex Moffat items, Q30708313 linked to the enwiki biography and Q27837145 linked to the disambig page; those should be merged. In general, Wikidata items for disambiguation pages are a somewhat strained concept - Wikidata binds together the same entity across languages, but disambiguation pages depend on the language and don't usually make sense as a cross-language concept; so I don't think a lot of effort goes into keeping these "fake" Wikidata items accurate.

Checking whether the page uses the __DISAMBIG__ magic word (which will be recorded in the page_props table) is much more reliable as disambiguation templates do that (and various on-wiki features rely on it so editors notice if it breaks).

The page Elena mentioned, Alex_Moffat (recommendation), is definitely a disambiguation page (and has been for a long time).

This is probably due to the algorithm using Wikidata ("instance of: Wikipedia disambiguaion page") which is unreliable. E.g. in this case, Wikidata has two Alex Moffat items, Q30708313 linked to the enwiki biography and Q27837145 linked to the disambig page; those should be merged. In general, Wikidata items for disambiguation pages are a somewhat strained concept - Wikidata binds together the same entity across languages, but disambiguation pages depend on the language and don't usually make sense as a cross-language concept; so I don't think a lot of effort goes into keeping these "fake" Wikidata items accurate.

Checking whether the page uses the __DISAMBIG__ magic word (which will be recorded in the page_props table) is much more reliable as disambiguation templates do that (and various on-wiki features rely on it so editors notice if it breaks).

@Tgr - any additional work needs to be done for this task?

No, the task is about excluding categories or templates (per community configuration), which is done.