We have successfully begun a cross-team project examining [[ https://meta.wikimedia.org/wiki/Research:Incubator_and_language_representation_across_Wikimedia_projects | Incubator and language representation across Wikimedia projects
]].
The project include the following goals:
- Develop metrics for the state languages at Wikimedia
- Develop metrics for better understanding Incubator
- Develop knowledge gaps metrics for measuring language gaps
This task addresses the goal of **developing knowledge gaps metrics for measuring language gaps.**
As a first step, I need to complete a data needs assessment and begin compiling all primary and secondary data into usable Hive table(s):
[x] Determine and document data that we currently have (via MariaDB, Hive, Meta page(s), etc.)
[x] Determine and document data that we //do not// currently have, but will need in order to develop metrics for the language gap
[x] Develop plan for acquiring secondary needed data
[] Acquire the needed secondary data
[] Combine data into usable Hive table(s)