In the past several months, @Miriam has made many improvements to the unillustrated articles algorithm and the image matching algorithm. Now that it's changed in a few ways, we want to reassess the coverage: for how many articles on each wiki will we be able to propose matches?
For a set of wikis, we want to calculate six things:
- Total number of articles in the wiki
- Unillustrated articles in the wiki
- Articles with match from any source (polished): this means the count of unillustrated articles that have a match from any of the three sources, after the "polishing" steps to remove local images, etc.
- Wikidata match (polished), Commons category match (polished), Interwiki match (polished): these are the number of unillustrated articles with matches from each of these sources. Since an article can have a match from more than one, these will sum to more than the "Articles with match from any source (polished)" value.
Here is a table with a sample row showing the output that we want:
wiki | Total number of articles | Unillustrated articles | Articles with match from any source (polished) | Wikidata match (polished) | Commons category match (polished) | Interwiki match (polished) |
enwiki | 6,000,000 | 3,000,000 | 250,000 | 20,000 | 150,000 | 200,000 |
The list of wikis for which we want these numbers is:
- enwiki
- arwiki
- kowiki
- cswiki
- viwiki
- frwiki
- fawiki
- ptwiki
- ruwiki
- trwiki
- plwiki
- hewiki
- svwiki
- ukwiki
- huwiki
- hywiki
- srwiki
- euwiki
- arzwiki
- cebwiki
- dewiki
- bnwiki