Context
There’s something that’s come up in the context of link recommendations, and also in the community conversations about images. There are some articles that we don’t want to make recommendations for because either (a) it is very easy to make the wrong decision about images on those articles, (b) those kinds of articles don’t need images.
- Disambiguation pages: should never have images.
- Years (e.g. “640 CE”): usually hard to pick the right image.
- Lists (e.g. “List of people named Jazmin”): usually hard to pick the right image.
- Redirect Pages
Currently, the output of the algorithm provides (1) unillustrated articles with image suggestions. These we are confident that there are no articles of the types listed above. The algorithm also provides (2) a list of unillustrated articles for MediaSearch to map their image suggestions to. The latter set of unillustrated articles we don't have a way to guarantee that the subset of data does not contain the article types listed above.
Acceptance Criteria
- As a contributor OR botwriter, I want to ensure I do not have the following types of articles, so that I can productively spend my time on the types of articles that do need images and are much easier to apply without scrutiny.
- Disambiguation pages: should never have images.
- Years (e.g. “640 CE”): usually hard to pick the right image.
- Lists (e.g. “List of people named Jazmin”): usually hard to pick the right image.
- Redirect Pages
Subtasks
- Write tests to cover these scenarios
Open Questions
- In order for us to better monitor data quality, should we include metadata about pages as an attribute of the unillustrated articles?