Page MenuHomePhabricator

Add language and incubating wikis data to canonical data
Open, In Progress, LowPublic

Description

@CMyrick-WMF has done a bunch of excellent work cleaning and analyzing data on incubating wikis, which also involved a bunch of work with language data.

We should incorporate this work into canonical data. Proposed work sequence:

  1. T346855: Provide ISO 639 language codes in canonical wiki dataset
  2. T392951: Create a first version of the canonical language dataset
  3. T393075: Create a canonical dataset for incubating wikis
  4. T393076: Add Glottolog fields to canonical language dataset

@CMyrick-WMF has some initial drafts for the table schemas in this doc.

Potential additional work

Event Timeline

nshahquinn-wmf edited projects, added Research; removed Movement-Insights (FY25-26 H1).

I will be eagerly supporting this work, but @CMyrick-WMF has already been taking the lead!

nshahquinn-wmf lowered the priority of this task from Medium to Low.Aug 4 2025, 7:13 PM

My understanding is that Research wants to categorize this as low priority.

CMyrick-WMF changed the task status from Open to In Progress.Aug 5 2025, 3:01 PM
CMyrick-WMF changed the status of subtask T346855: Provide ISO 639 language codes in canonical wiki dataset from Open to In Progress.
CMyrick-WMF changed the status of subtask T393075: Create a canonical dataset for incubating wikis from Open to In Progress.
Miriam moved this task from Needs Sign-off to Epics on the Research board.