Page MenuHomePhabricator

Have list of wiki communities in Wikimania Scholarships app automatically update
Closed, DeclinedPublic

Description

Follow up from T155666: Add Tulu (tcy) to "Primary language community on wiki" list.

Should we have a follow-up bug to make this list dynamically generated from action=sitematrix or something?

I wouldn't be against it. We would still need some way to add the "small", "medium", "large" classifications that are used in the reporting however. One thing that could certainly be done would be to get rid of the static class that has duplicate data and instead pull the list from the DB. I don't remember exactly, but I think the PHP class predated the DB table. We could also add a UI for admins to add/change the DB values so things like this don't require a code deployment.

https://noc.wikimedia.org/conf/highlight.php?file=small.dblist and related could be used for that.

See also

Event Timeline

In this task, we would need to change the Communities.php . We would need to pull these values from the DB table of language communities to get rid of this duplicate list. Then, add a UI for admins so that they are able to add language communities and their codes in future. @bd808 Did I miss anything?

It would be interesting to go back to @eyoung and find out if the small/medium/large information that was added for reporting is actually used. If it is not then we could gather the up to date list of projects from one of several APIs instead of the static class or the database. If it is used, then it would be good to learn if the small/medium/large classification that are used correspond to the small.dblist, medium.dblist, large.dblist data files that are actively maintained for the production wikis. If there is a correspondence to these lists then we could also use an API and the dblists (which can be fetched from those NOC urls) instead of the current data.

@bd808 The small/medium/large information is used to award scholarships. We allocate a certain number of scholarships to each category, and pick the top X in that category. So having that list of "which language communities are small/med/large" is super important, but the list we use right now is really out of date. I believe right now we are using stats from 2015.

In the small.dblist, medium.dblist, large.dblist data files, what criteria is used to categorize projects as small/med/large?

In the small.dblist, medium.dblist, large.dblist data files, what criteria is used to categorize projects as small/med/large?

That is a really good question! I tracked down the maintenance script that is used to create these lists in the WikimediaMaintenance extension. It ranks the wikis based on the total number of pages reported in the site_stats table:

  • small.dblist < 10,000 pages
  • medium.dblist < 1,000,000 pages
  • large.dblist >= 1,000,000 pages

Currently, large.dblist tracks 41 wikis, medium.dblist has 368, and small.dblist has 501.

I'm not sure how good of a proxy page count is for active editor count which is likely the current ranking metric. The dblist based ranking is also by wiki rather than language. This means that enwiki is ranged as large, but enwikibooks is ranked as medium. Spanish language wikis appear in all three categories.