The first iteration of "add an image" as built by the Growth team will operate off a static file of image suggestions that will be generated once.
Requirements
- Users have access to all the suggestions (i.e. unillustrated articles with image matches) currently available via the Image Matching Algorithm.
- Suggestions via MediaSearch should be excluded.
- If a given article has multiple candidate image suggestions, they should all be available.
- Articles should be filtered out that fall into the following groups, via the filters already developed in T276137: Exclude unillustrated articles that should not have images:
- Disambiguation pages
- Years
- Lists
- Redirects
- Suggestions will not need to be regenerated or updated based on new images in Commons or new data in Wikidata; a static set of suggestions will suffice for Iteration 1.
- The Growth team will be prioritizing the following wikis, but prefers to load data for all Wikipedias if trivial:
- Arabic
- Czech
- Vietnamese
- Bengali
- Spanish
- Portuguese
- Persian
- Turkish
- After it is generated, the file should be loaded to Hadoop so that the Search team can pick it up to complete T285817: Add an image: load static file to search index. The table needs to minimally contain:
- wiki
- page_id
- namespace
Timing
The Growth team would like this file to be generated (and indexed) close to August 17, to allow for recent data, but still for the data to be available in Search early enough for the team to develop with and test it.