Change Details

NOTE: blocked by T299059 * gather all wikidata-ids stored in the parquet written by https://github.com/cormacparle/commons_wikidata_links/blob/main/gather_data.ipynb , plus the metadata associated with them * gather all wikidata-ids from all commons `depicts` and `is digital representation of`statements * merge the two sets into one collection of wikidata ids on commons * then for each relevant wiki find all unillustrated articles ([[ https://github.com/mirrys/ImageMatching/blob/main/algorithm.ipynb | see the Image Suggestions Algorithm code for how (note that certain types of pages are excluded, we need to replicate this) ]]) with their wikidata-ids, wiki and article title * get the intersection of wikidata-ids * store the following in a file in hdfs ** wiki ** article title ** suggested image ** reason the image was suggested