NOTE: blocked by T299059
* gather all wikidata-ids stored in the parquet written by https://github.com/cormacparle/commons_wikidata_links/blob/main/gather_data.ipynb , plus the metadata associated with them
* gather all wikidata-ids from all commons `depicts` and `is digital representation of`statements
* merge the two sets into one collection of wikidata ids on commons
* then for each relevant wiki find all unillustrated articles ([[ https://github.com/mirrys/ImageMatching/blob/main/algorithm.ipynb | see the Image Suggestions Algorithm code for how (note that certain types of pages are excluded, we need to replicate this) ]]) with their wikidata-ids, wiki and article title
* get the intersection of wikidata-ids
* store the following in a file in hdfs
** wiki
** article title
** suggested image
** reason the image was suggested