We have successfully begun a cross-team project examining Incubator and language representation across Wikimedia projects.
The project include the following goals:
- Develop metrics for the state languages at Wikimedia
- Develop metrics for better understanding Incubator
- Develop knowledge gaps metrics for measuring language gaps
This task addresses the goal of developing knowledge gaps metrics for measuring language gaps.
As a first step, I need to complete a data needs assessment and begin compiling all primary and secondary data into usable Hive table(s):
- Determine and document data that we currently have (via MariaDB, Hive, Meta page(s), etc.)
- Determine and document data that we do not currently have, but will need in order to develop metrics for the language gap
- Develop plan for acquiring secondary needed data
- Acquire the needed secondary data
- Add all data-related (source files and wrangling scripts) to the Gitlab repo
- Combine data into usable Hive table(s)