Page MenuHomePhabricator

Analyze number of productive article contributions made by "junior contributors" using talk page features
Open, MediumPublic

Description

We think newer contributors learning and progressing as productive Wikipedians depends on their ability to communicate with other, more experienced contributors.

In an effort to help validate the above hypothesis, we would like to have a baseline of the following measures:

  1. The average number of productive article contributions Junior Contributors who use talk pages make each month
  2. The average number of productive article contributions Junior Contributors who do not use talk pages make each month
  3. The total number of productive article contributions (not reverted within 48 hours) made by Junior Contributors who use talk pages within the same month.
  4. The proportion of all article edits that are reverted within 48 hours and are made by Junior Contributors who use talk pages within the same month.

Where we define "Number of productive article contributions," "Junior Contributors," and "use talk pages" as follows:

  • Average number of productive article contributions: the average number of edits Junior Contributors make to to article pages each month that are not reverted within 48 hours of being published.
  • Junior contributors: https://www.mediawiki.org/wiki/Talk_pages_project/Glossary
  • Use talk pages: contributor who make at least two edits to any talk page namespace within a given month, that are not reverted within 48 hours of being published.

Done

  • A chart (time series?) showing how the following two metrics have changed over the past 3 years:
    • The average number of productive article contributions Junior Contributors who use talk pages make, each month
    • The average number of productive article contributions Junior Contributors who do not use talk pages make, each month
  • A chart showing how the following metric has changed over the past 3 years:
    • The total number of productive article contributions (not reverted within 48 hours) made by Junior Contributors who use talk pages within the same month.
  • A chart showing how the following metric has changed over the past 3 years:
    • The the proportion of all article edits that are reverted within 48 hours and are made by Junior Contributors who use talk pages within the same month.

Event Timeline

kzimmerman triaged this task as Medium priority.Dec 23 2019, 5:46 PM
kzimmerman moved this task from Triage to Needs Investigation on the Product-Analytics board.
kzimmerman added a subscriber: kzimmerman.

@MNeisler to add links to existing queries that can be used by @Mayakp.wiki for this task

I'm documenting a couple of draft queries below that can be used for reference for calculating this metric. Some QA and modifications may be needed based on the finalization of how this metric is defined.

Example Query 1: Finds the total number of productive article contributions (not reverted within 48 hours) made by Junior Contributors who use talk pages within the same month.

--Find all distinct talk page contributors for a given month and wiki
with talk_contributors AS (
    SELECT
        DISTINCT event_user_text as user_name,
        trunc(event_timestamp, 'MONTH') as date,
        wiki_db as wiki
    FROM wmf.mediawiki_history mwh
--restricting to wikipedia views
    INNER JOIN canonical_data.wikis ON
        wiki_db = database_code AND
        database_group ==  'wikipedia'
    WHERE
-- adjust to desired timeframe
        event_timestamp >= '2019-09-01' and 
        event_timestamp < '2019-11-01' and 
        event_entity = 'revision' and
        event_type = 'create' and
        event_user_revision_count < 100 and
        snapshot = '2019-11' and not
    --remove bots
        ARRAY_CONTAINS(event_user_groups_historical, 'bot') and
        page_namespace_historical % 2 == 1   -- all talk pages have odd numbered page_namespace 
)
--Find non-reverted article edits by those contributors
SELECT
    trunc(event_timestamp, 'MONTH') as date,
    COUNT(*) as article_edits,
    wiki_db as wiki
FROM wmf.mediawiki_history
INNER JOIN talk_contributors ON 
    event_user_text = talk_contributors.user_name AND
    wiki_db = talk_contributors.wiki AND
    trunc(event_timestamp, 'MONTH') = talk_contributors.date
WHERE
-- revision is not reverted within 48 hours 
    NOT (revision_is_identity_reverted and 
        revision_seconds_to_identity_revert <= 172800 ) AND
-- adjust to desired timeframe
    event_timestamp >= '2019-09-01' and 
    event_timestamp < '2019-11-01' and 
    event_entity = 'revision' and
    event_type = 'create' and
    snapshot = '2019-11' and 
-- restrict to article namespaces
    page_namespace_historical  == 0
GROUP BY trunc(event_timestamp, 'MONTH'), wiki_db;

Example Query 2: Finds the proportion of all article edits that are reverted within 48 hours and are made by Junior Contributors who use talk pages within the same month.

with talk_contributors as (
    SELECT
        DISTINCT event_user_text as user_name,
        trunc(event_timestamp, 'MONTH') as date,
        wiki_db as wiki
    FROM wmf.mediawiki_history mwh
--restricting to wikipedia views
INNER JOIN canonical_data.wikis ON
    wiki_db = database_code and
    database_group ==  'wikipedia'
WHERE
-- adjust to desired timeframe
    event_timestamp >= '2019-09-01' and 
    event_timestamp < '2019-11-01' and 
    event_entity = 'revision' and
    event_type = 'create' and
    event_user_revision_count < 100 and
    snapshot = '2019-11' and not
    --remove bots
    ARRAY_CONTAINS(event_user_groups_historical, 'bot') and
    page_namespace_historical % 2 == 1   -- all talk pages have odd numbered page_namespace 
)
SELECT
    trunc(event_timestamp, 'MONTH') as date,
    wiki_db as wiki,
    SUM(cast(revision_is_identity_reverted and 
            revision_seconds_to_identity_revert <= 172800 as int))/COUNT(*)  AS revert_rate
FROM wmf.mediawiki_history
INNER JOIN talk_contributors ON 
    event_user_text = talk_contributors.user_name AND
    wiki_db = talk_contributors.wiki AND
    trunc(event_timestamp, 'MONTH') = talk_contributors.date
WHERE
-- adjust to desired timeframe
    event_timestamp >= '2019-09-01' and 
    event_timestamp < '2019-11-01' and 
    event_entity = 'revision' and
    event_type = 'create' and
    snapshot = '2019-11' and 
-- restrict to article namespaces
    page_namespace_historical  == 0
GROUP BY trunc(event_timestamp, 'MONTH'), wiki_db;

Some remaining questions:

  • What does it mean to "use talk page features"? The above query defines "using talk page features" as 1 edit to any talk page namespace within a given month. Should we redefine as at least 2 edits? Note: The queries above defines this as 1 edit to any talk page namespace.
  • Would it be beneficial to also review average productive article contributions per Junior Contributor?

@ppelberg -Once you finish adding a description for this task, can you reassign to @Mayakp.wiki to work on next quarter? Thanks!

@ppelberg what's the status on this? Is this a task that will need to be completed in Q3, or should we close it out or de-prioritize it/

@ppelberg what's the status on this? Is this a task that will need to be completed in Q3, or should we close it out or de-prioritize it/

Thank you for the ping, Kate.

Timing
This is a task we'd like to be completed in Q3.

Todo

  • Finalize/confirm the definition of the following: 1) "productive article contributions"and 2) "talk page features"

@Mayakp.wiki to revisit status with @ppelberg in her meeting today

Brought this up in our 1:1 today. Peter will confirm status of T233890 by next wednesday 03-25.

Todo

  • Finalize/confirm the definition of the following: 1) "productive article contributions"and 2) "talk page features"

See responses to Megan, in-line below. These responses are also now reflected in the task description.


Some remaining questions:

  • What does it mean to "use talk page features"? The above query defines "using talk page features" as 1 edit to any talk page namespace within a given month. Should we redefine as at least 2 edits? Note: The queries above defines this as 1 edit to any talk page namespace.

+1. Starting with at least 2 edits to talk pages, that are not reverted within 48 hours of being posted

  • Would it be beneficial to also review average productive article contributions per Junior Contributor?

Yes. Good call, Megan. This is now reflected in the task description.


Timing
Per @Mayakp.wiki's and my conversation today, we plan to work on this next quarter (Q4).

This task is going to be re-purposed to help us establish a baseline for the key results we've set for this year's (FY20-21) talk pages project work

@ppelberg to update task description with metrics that should be calculated

This task is going to be re-purposed to help us establish a baseline for the key results we've set for this year's (FY20-21) talk pages project work

Task purpose

  • Now that we've defined our OKRs for this upcoming year, I can update the task description with the metrics we will use to evaluate the impact the work we'll be doing this year.

Task priority
Work on this task has not yet been prioritized.

ppelberg moved this task from Backlog to Analytics on the Editing-team (Tracking) board.

Moving to backlog
@MNeisler and I talked about this today. We decided this task belongs in the following category: Longer term questions that could inform the Editing Team's product strategy. Where "product strategy" means something like: "What problems/opportunities should we consider addressing?"

MNeisler edited projects, added Research ideas; removed Product-Analytics.