We need to measure our success indicator about the amount of communities coming from outside of Europe and NA.
2 solutions that we can do without directly asking our users or admins is by looking at IPs of contributions and languages used in the labels.
In this epic we will try out both solutions, check which logical and technical obstacles we encounter on the way, compare the data between them and decide whether this provides meaningful enough data that we can use to report on this (questionable) indicator.
Here's the plan:
- We learn to count contributions from each country in a Wikibase (using IP stored in recent changes). https://phabricator.wikimedia.org/T372251
- We identify which Wikibases fall into the 'outside of NA and Europe' bucket based on this approach (using property P30 for continent and Europe (Q46) or Northern America (Q49))
- We learn to count labels in each language in a Wikibase. https://phabricator.wikimedia.org/T372252
- We identify which Wikibases fall into the 'outside of NA and Europe' bucket based on this approach (using P37 for official language of a country).
- We compare results received using both approaches and check for correlation (important: the IP stored approach will only return instances that had changes in the last 90 days, because of how recent changes are stored - this needs to be kept in mind when comparing).
- We decide whether any (or both) of these approaches are good enough to measure the indicator.