We have successfully begun a cross-team project [[ https://meta.wikimedia.org/wiki/Research:Incubator_and_language_representation_across_Wikimedia_projects | examining Incubator and language representation across Wikimedia projects ]].
The project include the following goals:
- Develop metrics for the state languages at Wikimedia
- Develop metrics for better understanding Incubator
- Develop knowledge gaps metrics for measuring language gaps
This task addresses the goal of developing knowledge gaps metrics for measuring language gaps.
For Q2/Q3, I will
[X] Finish integrating primary data via wrangling scripts [[ https://gitlab.wikimedia.org/repos/research/incubator-data-exploration | in GitLab repo ]]
[] Acquire the needed secondary data
[] Integrate secondary data via wrangling scripts [[ https://gitlab.wikimedia.org/repos/research/incubator-data-exploration | in GitLab repo ]]
[] Determine home for the integrated dataset(s) (Hive? keep in Gitlab?)
[] QA
[] Begin calculations for metrics
> This task has a dependency on task T348249 that needs to be resolved in order to acquire the needed secondary data
> January 2024 update: Due to the unresolved dependency, this task is going to take multiple quarters until blocker is resolved.