Additionally to getting baselines for categories as part of T383138.
- Get data on % low-level categories (filter out root categories) that are added month over month
- Get data on newly created one page categories
Probable approach:
- compile a list of hidden, maintenance, and non-diffusing categories
- script to listen to the event stream, then use api to gather non-hidden/maintenance categories for each new UW upload (and record it)
- keep file uploads that have edit tag ID = 12, i.e., UW
- any non-hidden/maintenance category that has a wikidata item counts as a non-low-level category? or is there a way of telling if a category has no parent?
- instead of low-level categories, we might look at a measure like the average depth in the tree