Context
The current algo implementation occasionally suggests placeholder images instead of actual content.
As a consumer of the API I would like to receive suggestions only for actual content.
This task follows from the investigation at
https://phabricator.wikimedia.org/T277828.
Acceptance Criteria
- As a user of the Image Suggestion API, when I make a request for image suggestions, I expect that all images detected as a "placeholder image" have been filtered out
- @Miriam validation query has been newly applied (see https://phabricator.wikimedia.org/T277828#6957015), and results should reflect 0 "placeholder images' found for representative wikis
- A static list of "placeholder images" has been generated and stored in HDFS
- The algorithm notebook has been updated to filter out "placeholder images"
Notes
- Data has been generated for the 2021-03 snapshot.
- We won't be able to provide unit tests, but integration tests will be coordinated with research