NOTE: This work will need to be done in collaboration with the [[https://www.mediawiki.org/wiki/Platform_Engineering_Team|Platform Engineering Team]] (PET), as it's their [[https://www.mediawiki.org/wiki/Platform_Engineering_Team/Data_Value_Stream/Data_Pipeline_Onboarding/|Generated Data Platform]] we'll be using
Now that we're pretty sure that pushing wikidata information into the `weighted_tags` field in the commons index improves image search on an experimental index, we need to do the same for the production commonswiki_file index
At the same time we also need to gather up all data relevant to image suggestions, and push it to various persistence layers for consumption by clients
Part 1
--
* Gather relevant data from wikidata for commons files
** Our original notebook that gathers the necessary data and writes it to a parquet file is here https://github.com/cormacparle/commons_wikidata_links/blob/main/gather_data.ipynb
** Subtask T299408 covers gathering additional data
** Subtask T300045 covers transforming it so it can be run by airflow
** Subtask T302095 makes it compliant with Search's update process
* ~~Push the data into the `commonswiki_file` search index~~
** {T299787}
Part 2
--
* Gather list of unillustrated articles with their suggestions
** {T299789}
Part 3
--
* Push suggestions flags to individual search indices
** {T299884}
Part 4
--
* Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra
** {T299885}
Part 5
--
* Orchestrate all those scripts in airflow - write an airflow job to orchestrate them that runs **every week**
** {T302434}