The image matching algorithm generates lists of unillustrated articles based on the mediawiki_imagelinks table, which contains monthly snapshots of the imagelinks table from the SQL replicas.
While this approach is efficient, we might actually end up recommending images for articles that have been recently illustrated, i.e. where someone has added an image after the current snapshot date (beginning of the month) and before the next snapshot date (end of the month). To understand the limitations of this approach, we would like to estimate the rough number of unillustrated articles that get illustrated in a month.
To do so, we can compute the set of unillustrated articles for a snapshot of mediawiki_imagelinks (e.g., July 2020), then calculate the percentage of those articles which are still unillustrated in the following snapshot (e.g. August 2020). We can then repeat this for different snapshots and average the results. This will give us an estimate of the percentage of unillustrated articles that are still unillustrated after one month.
Let's use the same Wikis as in T272109.