In the measurement plan (T367686) we outlined that criteria for filtering languages (to meet a minimum threshold) and also variables to perform clustering with. This requires the following data points to be gathered
- project type
- number of active editors in the last 3 months (by month)
- number of edits in the last 3 months (split by: content & non-content)
- language directionality
- time spent by language on incubator (until 30 June 2024)
- starting date to be considered as the average of first 5 percentile of edits