When gathering lead image data for image suggestions (and injecting that into the search index to aid in image search), we're using wmf_raw.mediawiki_imagelinks in hive as a data source
When I run the image suggestions data pipeline, if File:Image_X is lead image on Page_Y which corresponds to wikidata id Qxxx, then Image_X gets Qxxx added to its document in the search index, and is considered a good image suggestion for articles on other wikis corresponding to Qxxx.
If the day after the pipeline is run Image_X gets removed from Page_Y then, because wmf_raw snapshots are only monthly, Image_X will show up in searches/suggestions for Qxxx for another month when perhaps it shouldn't
This will manifest as a problem mostly on frequently-updated pages. The most obviously frequently-updated page on each wiki is the main page, which has wikidata id Q5296. We can exclude that wikidata id when we're gathering lead image data in the data pipeline easily
(should we also consider doing the same for other frequently updated pages?)