Page MenuHomePhabricator

[REQUEST] India specific stats
Closed, ResolvedPublic

Description

Name for main point of contact and contact preference
Vidhu Goyal, prefer follow-up over email

What teams or departments is this for?
Communications team

What are the details of your request? Include relevant timelines or deadlines

  • How many contributors/volunteers do we have from India?
  • Percentage of change along with number of new articles which has been added on Indic Wikipedia's. Also the total number of article in India languages.
  • Percentage of change in editor count from previous years?
  • How many new editors joined in year 2022 from India in Indic Wikis (including English)?
  • What is the percentage of editors who are contributing regularly/repeatedly? (One time vs repeated editor).

How will you use this data or analysis?
Inform some of the upcoming press work in India.

Is this request urgent or time sensitive?
Would be grateful if I could ideally get this information by Wednesday. In case this may take more time, please let me know and I will try to work around this.

Event Timeline

mpopov triaged this task as High priority.
mpopov moved this task from Triage to Current Quarter on the Product-Analytics board.
mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
mpopov moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Additional questions posed by Vidhu over email:

  1. There are two languages, Awadhi and Tulu, who are not official languages but have their own Wikipedia editions.... [request] for following stats for both of these:
  • when did this language become a Wikipedia?
  • how many contributors have worked on this language edition?
  • how many articles are there in this language so far.
  • can we find out how many people are reading articles in these languages?
  1. "...There are many languages in incubator phase - both active and inactive. Can you help me find some stats like the above of the below languages [Angika, Mizo, Ho, Boro, Rajasthani] while they are in incubator phases? ..."

From @cchen:
Clarifying Questions:
"For the language of India, according to the Census of India of 2001, India has 122 major languages and 1599 other languages. The Indian constitution recognizes 22 official languages: Bengali, Hindi, Maithili, Nepalese, Sanskrit, Tamil, Urdu, Assamese, Dogri, Kannada, Gujarati, Bodo, Manipur (also known as Meitei), Oriya, Marathi, Santali, Telugu, Punjabi, Sindhi, Malayalam, Konkani, and Kashmiri. Are there other languages you are looking for in this request missing from this list?

Also, for comparison to the "previous year", are we comparing 2021 to 2020, or the first 3 months in 2022 to the first 3 months in 2021?"

Results:

  1. How many contributors/volunteers do we have from India?
I'm not sure if you have access to Superset, but if you do I created a chart of the number of monthly editors from India across all our projects. In 2020, I get a monthly average of 65,416 editors, and in 2021 monthly average is 65,666. This included anonymous and non-anonymous editors.
  1. Percentage of change along with number of new articles which has been added on Indic Wikipedia's. Also the total number of articles in Indian languages.
For Indic Wikipedias, I use Wikipedias of 20 official languages (excluding English): Bengali, Hindi, Maithili, Nepalese, Sanskrit, Tamil, Urdu, Assamese, Kannada, Gujarati, Manipur (also known as Meitei), Oriya, Marathi, Santali, Telugu, Punjabi, Sindhi, Malayalam, Konkani, and Kashmiri. There are 22 official languages in total, but there's no Wikipedia for Dogri and Bodo languages. Here is a chart of the number of net new content across 20 Indian languages. In 2020, 78,086 net new content was added to Wikipedias of Indian languages. And in 2021, 78,104 net new content was added. there was a 0.02% increase. The total number of content pages is from Wikistats2, by the end of 2021, there are 1,058,903 content pages in total for all 20 Indian languages Wikipedias.
  1. Percentage of change in editor count from previous years?
Compared to average monthly editors in 2020, there was a 0.38% increase in 2021.
  1. How many new editors joined in year 2022 from India in Indic Wikis (including English)?
Due to privacy reasons, editor geolocation data is only kept for 90 days. We only have data for February and March 2022. In addition, we don't have geolocation data related to user registration. In this case, I took the editors who made edits in India in Indic Wikis (including English), instead of editors who joined from India. Another thing to note is that one editor can have multiple accounts across different Wikis, especially for editors in India where there are over 20 official languages. In this case, one editor may be counted multiple times across different wikis. In February and March 2022, we have 30,963 new editors joined in Indic Wikis (including English) in India. This number excluded bot editors.
  1. What is the percentage of editors who are contributing regularly/repeatedly? (One time vs repeated editor).
I used the user_editcount field from mediawiki_user table, which is the rough number of edits and edit-like actions the user has performed. 41.8% of editors in Wikipedias of Indian languages are one-time editors, and 58.2% of editors are repeated editors.
  1. The birthday of Tulu Wikipedia is August 6th, 2016 according to this page. For other wikis, looks like https://meta.wikimedia.org/wiki/Wikimedia_projects this page has start dates for some languages, but I didn't see Awadhi Wikipedia in it.
  2. Angika, Mizo, Rajasthani are active in the incubator phase (according to this page). Looks like Bora language is in the test phase according to this list.

Hope this is helpful.

A few more resources, some, shared on the original email thread:

  1. some of the GLOW queries that could apply to this request can be found here: https://github.com/IreneFlorez/GLOW/blob/master/scripts/data_wrangling
  2. for number of articles and number of contributors, I've previously referenced this page: https://meta.wikimedia.org/w/index.php?title=List_of_Wikipedias/Table&direction=prev&oldid
  3. the requestor may be interested in looking at interactions and devices, readers superset dashboard
  4. This outdated India Superset dashboard can be updated and adapted for more India stats.