Page MenuHomePhabricator

[REQUEST] Number of editors across all Wikipedias, Editors across all projects per month
Closed, ResolvedPublic

Description

What's requested:
I help update the Communications department's Public communications statistics document, and I noted that we last updated a couple of statistics last year. I was wondering if you'd be able to provide two updated statistics:
Number of editors across all language editions of Wikipedia.
Number of contributors/editors across all Wikimedia projects.

Why it's requested:
Data last updated 2019-02

Metric that needs to be updated from 2019-02:
Wikipedia is edited by more than 250,000 editors every month around the world.
Wikimedia projects are edited by more than 280,000 editors every month around the world.

When it's requested:
2020-08-21

Other helpful information:
https://docs.google.com/document/d/1Q7VkjpbZYGFAZM2Cj6kotv2ASUuIhJtONWiq_jGAimo/edit#heading=h.jbq04i5w8f1

Event Timeline

SNowick_WMF created this task.

Data collected using @nshahquinn-wmf's editor_month table, updated counts here.

@SNowick_WMF can you share the query you used? According to these numbers, about 98.6% of all editors edits Wikipedia; that's definitely plausible but higher than I would have guessed, so I'm curious 😊

Hi @nshahquinn-wmf Presto query below. It does look like my percentages are higher than your past data, there might be something missing in how I differentiated all vs Wikipedia edits. I used wikis ending with regex wiki to get Wikipedia edits.

select
month,
count(*) as active_editors
from (
select
cast(month as date) as month,
user_name,
sum(edits) as all_edits,
max(bot_by_group) as bot_by_group,
cast(date_trunc('month',min(user_registration)) as date) as registration_month
from neilpquinn.editor_month
where
month > TIMESTAMP '2019-01-01 00:00:00.000' and
user_id != 0
AND regexp_like(wiki, 'wiki$')
group by month, user_name
) global_edits
where
not bot_by_group and
not regexp_like(user_name, 'bot\\b')

group by month

Thanks for sharing, @SNowick_WMF!

Yes, that regex isn't a good way to filter to Wikipedias only; in addition to Wikipedias, it catches Commons (commonswiki), Wikidata (wikidatawiki), Meta (metawiki), MediaWiki.org (mediawikiwiki), and a number of smaller wikis.

You can see the full list by running this query:

SELECT
    english_name,
    database_code
FROM canonical_data.wikis
WHERE
    REGEXP_LIKE(database_code, 'wiki$') AND
    database_group != 'wikipedia'

That also points to the best way to filter by project: join with canonical_data.wikis on database_code and then filter to `database_group = 'wikipedia'.

The rest of the query looks good, though 😊

Hi @ASvitak I will re-run this query and let you know here when data is updated, thanks @nshahquinn-wmf for the info/assist.

Data is updated on Registered editors and Wikipedia editors by month spreadsheet with corrected editor counts. Note the counts that changed were the Wikipedia editors, the average monthly Wikipedia editor count for the last 12 months is 285798. The counts for all project editors have not changed.

Query used:

select
    month,
    count(*) as active_editors
from (
    select
        cast(month as date) as month,
        user_name,
        sum(edits) as all_edits,
        max(bot_by_group) as bot_by_group,
        cast(date_trunc('month',min(user_registration)) as date) as registration_month
    from neilpquinn.editor_month as em
    LEFT JOIN canonical_data.wikis w1 ON em.wiki = w1.database_code
    where
        month > TIMESTAMP '2019-01-01 00:00:00.000' and 
        user_id != 0  
        AND  regexp_like(w1.database_group, 'wikipedia') -- For Wikipedia editors
    group by month, user_name
) global_edits
where
    not bot_by_group and 
    not regexp_like(user_name, 'bot\\b')  
group by month

Thanks, Shay! Everything looks good now 🙂