Page MenuHomePhabricator

Analyze activity levels for communities supported only by MinT
Closed, ResolvedPublic

Description

A set of Wikipedias have received machine translation support for the first time with MinT as their only option available. 56 Wikipedias are supported thanks to several open translation models: 42 with NLLB-200 (T326578), 13 with OpusMT (T333969), and 1 with IndicTrans2 (T337656)

This ticket proposes to analyze different statistics to identify communities with relatively high levels of activity in order to select some as target wikis to consider for research and analysis purposes. This is an initial exploration sonsidering aspects such as:

  • Type of contribution: edits, and translations.
  • Platforms: All vs. mobile.
  • Impact: Deletion rate.

The list of wiki codes for all languages is captured below:

1ace
2ady
3ang
4ary
5arz
6ast
7av
8awa
9azb
10ban
11bcl
12bi
13bjn
14bo
15br
16bug
17ce
18ch
19chr
20crh
21din
22dz
23ff
24fj
25fo
26fon
27frp
28fur
29fy
30gag
31gor
32guw
33hif
34jam
35kab
36kbd
37kbp
38kg
39ki
40koi
41krc
42ks
43kv
44li
45lij
46lmo
47ltg
48mdf
49min
50mnw
51myv
52new
53nn
54oc
55pag
56pam
57rn
58sat
59sc
60scn
61sg
62shn
63srn
64ss
65szl
66tcy
67tet
68tn
69to
70tpi
71tum
72ty
73vec
74wa
75war
76wo
77xal

Event Timeline

An initial analysis for the Jan-May period in 2023, Note that at this time some languages have not been enabled yet and others have been enabled quite recently. So take the following analysis with a grain of salt (worth re-checking these queries with a later period of time in the future):

Top editing activity (query):

  1. Waray (war)
  2. Asturian (ast)
  3. Cantonese (yue)
  4. Egyptian Arabic (arz)
  5. Occitan (oc)

Top editing activity on mobile (query):

  1. Egyptian Arabic (arz)
  2. Cantonese (yue)
  3. Santali (sat)
  4. Kashmiri (ks)
  5. Central Bikol (bcl)

Top translation activity (query):

  1. Egyptian Arabic (arz)
  2. Central Bikol (bcl)
  3. Tswana (tn)
  4. Kashmiri (ks)
  5. Bhojpuri (bh)

Top translation activity on mobile (query):

  1. Central Bikol (bcl)
  2. Kashmiri (ks)
  3. Santali (sat)
  4. Bhojpuri (bh)
  5. Egyptian Arabic (arz)

Top non-reverted article creation (query):

  1. Tumbuka (tum)
  2. Cantonese (yue)
  3. Breton (br)
  4. Crimean Tatar (crh)
  5. Egyptian Arabic (arz)

Top non-reverted article creation on mobile (query):

  1. Kashmiri (ks)
  2. Central Bikol (bcl)
  3. Santali (sat)
  4. Fula (ff)
  5. Lombard (lmo)

Top non-reverted article translations (query):

  1. Egyptian Arabic (arz)
  2. Central Bikol (bcl)
  3. Kashmiri (ks)
  4. Asturian (ast)
  5. Bhojpuri (bh)

Top non-reverted article translations on mobile (query):

  1. Central Bikol (bcl)
  2. Kashmiri (ks)
  3. Bhojpuri (bh)
  4. Santali (sat)
  5. Pangasinan (pag)

@Pginer-WMF
Below are other metrics that might be useful in selecting pilot communities for this research. Reviewed time period: January 2023 through June 2023

Top average number of monthly editors (query)
Defined as an average number of registered editors who make at least 1 edit during an average month.

  1. Egyptian Arabic (arz): 246 monthly editors
  2. Asturian (ast): 101 monthly editors
  3. South Azerbaijani (azb): 100 monthly editors
  4. Occitan (oc): 95 monthly editors
  5. Breton(br): 79 monthly editors

Top average number of monthly active editors (query)
Defined as the average number of registered users who made at least 5 content edits during an average month)

  1. Occitan (oc): 27 monthly active editors
  2. Egyptian Arabic (arz): 27 monthly active editors
  3. Breton (br): 22 monthly active editors
  4. Western Frisian (fy): 18 monthly active editors
  5. South Azerbaijani (azb): 17 monthly active editors

Top section translation activity (query):

  1. Central Bikol (bcl)
  2. Kashmiri (ks)
  3. Santali (sat)
  4. Egyptian Arabic (arz)
  5. Pangasinan (pag)

Top content interactions (query):
(While this is not a direct measure of editing activity at these wikis, it will be needed to inform the scope and baseline of the MinT key result being defined in T341182)

  1. Egyptian Arabic (arz)
  2. Cantonese (zh-yue)
  3. Asturian (ast)
  4. Waray (war)
  5. South Azerbaijani (azb)

Other Resources

  • I'd also recommend reviewing wiki comparison sheet, which includes a number of metrics to help identify activity levels at each wiki such as unique devices, percent mobile edits, content edits, and monthly editors at each wiki. Note, however, that this spreadsheet is only updated once a year so the most recent snapshot is from January 2023 and will not contain details on languages enabled only recently.
  • The content translation key metrics dashboard can be filtered by project to view published and deleted translation trends per project including translation activity by type and platform.

@Pginer-WMF
Below are other metrics that might be useful in selecting pilot communities for this research. Reviewed time period: January 2023 through June 2023

This is fantastic. Thanks Megan! @Easikingarmager was asking about the number of editors just yesterday.

Reading activity can be also an interesting aspect given that we are exploring ways to expose MinT to readers.
For the Jan-Jun period:

Top reading activity (query:

  1. Egyptian Arabic (arz)
  2. Cantonese (yue)
  3. Asturian (ast)
  4. Bhojpuri (bh)
  5. South Azerbaijani (azb)

Top reading activity on desktop (query):

  1. Egyptian Arabic (arz)
  2. Cantonese (yue)
  3. Asturian (ast)
  4. Waray (war)
  5. Occitan (oc)

Top reading activity on mobile (query):

  1. Egyptian Arabic (arz)
  2. Cantonese (yue)
  3. Bhojpuri (bh)
  4. Asturian (ast)
  5. South Azerbaijani (azb)

The data captured should be helpful to decide potential pilot wikis to focus on.