Page MenuHomePhabricator

Add a column to differentiate various wiki-types on canonical_data.wikis
Open, LowPublic

Description

Often while working with canonical_data.wikis, I need to select only content wikis (i.e. filter out affiliate wikis, governance wikis, metawiki, foundationwiki etc.) This is not very straightforward, as some affiliate wikis have public editability and visibility and I need to explicitly define the project groups for content wikis every time. So it would be helpful to have a column that differentiate various types of wikis.

I had an initial conversation with @nshahquinn-wmf and did some work on mapping various groups to types. Here is my initial proposal (open to discussing the classification & the nomenclature).

1{
2 "wikipedia": "content",
3 "wikibooks": "content",
4 "wiktionary": "content",
5 "advisors": "foundation",
6 "advisory": "foundation",
7 "wikiquote": "content",
8 "amwikimedia": "affiliate",
9 "wikisource": "content",
10 "apiportal": "technical_documentation",
11 "arbcom-cs": "functionaries",
12 "arbcom-de": "functionaries",
13 "arbcom-en": "functionaries",
14 "arbcom-fi": "functionaries",
15 "arbcom-nl": "functionaries",
16 "arbcom-ru": "functionaries",
17 "arwikimedia": "affiliate",
18 "wikinews": "content",
19 "wikiversity": "content",
20 "auditcom": "foundation",
21 "azwikimedia": "affiliate",
22 "bdwikimedia": "affiliate",
23 "betawikiversity": "incubation",
24 "bewikimedia": "affiliate",
25 "wikivoyage": "content",
26 "boardgovcom": "foundation",
27 "board": "foundation",
28 "brwikimedia": "affiliate",
29 "cawikimedia": "affiliate",
30 "chair": "foundation",
31 "chapcom": "governance",
32 "checkuser": "functionaries",
33 "cnwikimedia": "affiliate",
34 "collab": "foundation",
35 "commons": "content",
36 "cowikimedia": "affiliate",
37 "dkwikimedia": "affiliate",
38 "donate": "technical_infrastructure",
39 "ecwikimedia": "foundation",
40 "electcom": "governance",
41 "etwikimedia": "affiliate",
42 "exec": "foundation",
43 "fdc": "foundation",
44 "fiwikimedia": "affiliate",
45 "foundation": "foundation",
46 "gewikimedia": "affiliate",
47 "grants": "foundation",
48 "grwikimedia": "affiliate",
49 "hiwikimedia": "affiliate",
50 "id-internalwikimedia": "affiliate",
51 "idwikimedia": "affiliate",
52 "iegcom": "foundation",
53 "ilwikimedia": "affiliate",
54 "incubator": "incubation",
55 "internal": "foundation",
56 "labs": "technical_documentation",
57 "labtest": "test",
58 "legalteam": "foundation",
59 "login": "technical_infrastructure",
60 "maiwikimedia": "affiliate",
61 "mediawiki": "technical_documentation",
62 "meta": "governance",
63 "mkwikimedia": "affiliate",
64 "movementroles": "governance",
65 "mxwikimedia": "affiliate",
66 "ngwikimedia": "affiliate",
67 "nlwikimedia": "affiliate",
68 "noboard-chapterswikimedia": "affiliate",
69 "nostalgia": "misc",
70 "nowikimedia": "affiliate",
71 "nycwikimedia": "affiliate",
72 "nzwikimedia": "affiliate",
73 "office": "foundation",
74 "ombudsmen": "functionaries",
75 "otrs-wiki": "governance",
76 "outreach": "misc",
77 "pa-uswikimedia": "affiliate",
78 "plwikimedia": "affiliate",
79 "projectcom": "foundation",
80 "ptwikimedia": "affiliate",
81 "punjabiwikimedia": "affiliate",
82 "quality": "governance",
83 "romdwikimedia": "affiliate",
84 "rswikimedia": "affiliate",
85 "ruwikimedia": "affiliate",
86 "searchcom": "foundation",
87 "sewikimedia": "affiliate",
88 "sources": "incubation",
89 "spcom": "foundation",
90 "species": "content",
91 "steward": "functionaries",
92 "strategy": "governance",
93 "sysop-it": "functionaries",
94 "techconduct": "governance",
95 "ten": "misc",
96 "test2": "test",
97 "testcommons": "test",
98 "test": "test",
99 "testwikidata": "test",
100 "thankyou": "technical_infrastructure",
101 "transitionteam": "foundation",
102 "trwikimedia": "affiliate",
103 "uawikimedia": "affiliate",
104 "usability": "governance",
105 "vewikimedia": "affiliate",
106 "vote": "technical_infrastructure",
107 "wbwikimedia": "affiliate",
108 "wg-en": "governance",
109 "wikidata": "content",
110 "wikifunctions": "content",
111 "wikimania2005": "wikimania",
112 "wikimania2006": "wikimania",
113 "wikimania2007": "wikimania",
114 "wikimania2008": "wikimania",
115 "wikimania2009": "wikimania",
116 "wikimania2010": "wikimania",
117 "wikimania2011": "wikimania",
118 "wikimania2012": "wikimania",
119 "wikimania2013": "wikimania",
120 "wikimania2014": "wikimania",
121 "wikimania2015": "wikimania",
122 "wikimania2016": "wikimania",
123 "wikimania2017": "wikimania",
124 "wikimania2018": "wikimania",
125 "wikimaniateam": "wikimania",
126 "wikimania": "wikimania"
127}

Consolidated view of the same:

group_typedb_groups
affiliateamwikimedia, arwikimedia, azwikimedia, bdwikimedia, bewikimedia, brwikimedia, cawikimedia, cnwikimedia, cowikimedia, dkwikimedia, etwikimedia, fiwikimedia, gewikimedia, grwikimedia, hiwikimedia, id-internalwikimedia, idwikimedia, ilwikimedia, maiwikimedia, mkwikimedia, mxwikimedia, ngwikimedia, nlwikimedia, noboard-chapterswikimedia, nowikimedia, nycwikimedia, nzwikimedia, pa-uswikimedia, plwikimedia, ptwikimedia, punjabiwikimedia, romdwikimedia, rswikimedia, ruwikimedia, sewikimedia, trwikimedia, uawikimedia, vewikimedia, wbwikimedia
contentwikipedia, wikibooks, wiktionary, wikiquote, wikisource, wikinews, wikiversity, wikivoyage, commons, species, wikidata, wikifunctions
foundationadvisors, advisory, auditcom, boardgovcom, board, chair, collab, ecwikimedia, exec, fdc, foundation, grants, iegcom, internal, legalteam, office, projectcom, searchcom, spcom, transitionteam
functionariesarbcom-cs, arbcom-de, arbcom-en, arbcom-fi, arbcom-nl, arbcom-ru, checkuser, ombudsmen, steward, sysop-it
governancechapcom, electcom, meta, movementroles, otrs-wiki, quality, strategy, techconduct, usability, wg-en
incubationbetawikiversity, incubator, sources
miscnostalgia, outreach, ten
technical_documentationapiportal, labs, mediawiki
technical_infrastructuredonate, login, thankyou, vote
testlabtest, test2, testcommons, test, testwikidata
wikimaniawikimania2005, wikimania2006, wikimania2007, wikimania2008, wikimania2009, wikimania2010, wikimania2011, wikimania2012, wikimania2013, wikimania2014, wikimania2015, wikimania2016, wikimania2017, wikimania2018, wikimaniateam, wikimania

Event Timeline

I am happy to work on making an update to the code once the classification and the naming gets finalized.

KCVelaga_WMF renamed this task from Add a column differentiate various wiki-types on canonical_data.wikis to Add a column to differentiate various wiki-types on canonical_data.wikis.Mar 28 2024, 10:54 AM
KCVelaga_WMF triaged this task as Low priority.
KCVelaga_WMF updated the task description. (Show Details)
KCVelaga_WMF moved this task from Triage to Tracking on the Product-Analytics board.

Nice! I think I agree with 95% of this. My suggestions:

  • Merge "incubation" into "content". Essentially, those projects are just of lots of small content wikis combined into one.
  • Rename "technical" to "technical documentation".
  • Merge "arbcom" into governance (alternatively, we could make a "functionaries" category and put the arbcoms there along with "steward", "sysop-it", "ombudsmen", and "checkuser").
  • Move "outreach", "quality", "ten", and "usability" to "governance".
  • Move "noboard-chapterswikimedia" to "affiliate".
  • Move "spcom" to "foundation" (it was a Foundation board committee).
  • Create a new "technical infrastructure" category containing "vote" and "login" (these are basically not supposed to have any edits, just SecurePoll votes and central logins). We could also put these in "misc".
  • Move "wg-en" to "governance" (it was an old English Wikipedia committee). We could also make a "committee" category for community committees like "chapcom", "techconduct", and "electcom".
  • Move "thankyou" to "foundation". It's an identical case to "donate" (used by the Fundraising team to host donation messages and forms), so it should be in the same place. The two could also go in "technical infrastructure".

Most of these are not strong opinions, so of course feel free to disagree.

After the two of us come to a consensus, we probably should open the door for more feedback (I'm tempted to skip it, but I think there's some chance that there's a valuable perspective we don't have.) I think asking on working-with-data would be more than enough (we should probably provide the wiki name along with the code just to make that review easier).

@nshahquinn-wmf thanks for the review and the suggestions.

Merge "incubation" into "content". Essentially, those projects are just of lots of small content wikis combined into one.

I think these should be a seperate category, even though they essentially have content, the way these projects work / maintained are quite different from the rest of the group. My reasoning is, referring back to my original need of having this column, for many analyses, I need to consider only the projects listed under content currently (let's say, monitor anti-vandalism efforts). I feel it is easier to include the incubation category if needed (which is less often), rather than removing the database groups each time.

  • Rename "technical" to "technical documentation".
  • Merge "arbcom" into governance (alternatively, we could make a "functionaries" category and put the arbcoms there along with "steward", "sysop-it", "ombudsmen", and "checkuser").
  • Move "noboard-chapterswikimedia" to "affiliate".
  • Move "spcom" to "foundation" (it was a Foundation board committee).
  • Create a new "technical infrastructure" category containing "vote" and "login" (these are basically not supposed to have any edits, just SecurePoll votes and central logins). We could also put these in "misc".
  • Move "wg-en" to "governance" (it was an old English Wikipedia committee). We could also make a "committee" category for community committees like "chapcom", "techconduct", and "electcom".

Agree to all above, and made changes to mapping.

  • Move "thankyou" to "foundation". It's an identical case to "donate" (used by the Fundraising team to host donation messages and forms), so it should be in the same place. The two could also go in "technical infrastructure".

technical infrastructure sounds better.

  • Move "outreach", "quality", "ten", and "usability" to "governance".

I am not sure if governance is the right fit for outreach (which was more for documentation of outreach activities), and ten was for events/activities around Wikipedia 10 celebrations.

After the two of us come to a consensus, we probably should open the door for more feedback (I'm tempted to skip it, but I think there's some chance that there's a valuable perspective we don't have.) I think asking on working-with-data would be more than enough (we should probably provide the wiki name along with the code just to make that review easier).

Sounds good!

@nshahquinn-wmf

What do you think of the changes?

As a next step, should we post on #working-with-data?