As part of work on improving category selection in UW T383055, we want to know have baselines and measurements for categories. We are interested in monitoring whether there is an increase in category insertion before and after improvements to category selection, as well as general Describe step improvements that were done in Q4 FY23-24 as part of T358765
Scope
- Get data on how many categories are added on upload month over month
Implementation notes
- iterate over the page creation event stream (from/to a certain date)
- find a way to retrieve file pages created with edit tag 12 (UW's one). The event stream doesn't seem to include it. See also T368167: [M] Extend logo detection metrics to tools besides Upload Wizard
- for each page creation in commons file name space, do this api call for the initial revision id: https://commons.wikimedia.org/w/api.php?action=parse&oldid=335144206&prop=categories
- filter categories with the hidden JSON key as returned by the API