Page MenuHomePhabricator

Calculate net non-reverted edits within 48 hours excluding bot and revert edits
Closed, ResolvedPublic

Description

Currently, number of non-reverted edits is defined as edits which were not reverted within 48 hours, excluding bot edits. It measures the number of 'good' edits by human users. To analyze the impact of Portuguese Wikipedia turning off IP editing, we also want to see the edits trend after excluding revert edits. It measures the number of edits by human users, excluding edits created due to vandalism.

Deliveries

  • Measure non-reverted edits within 48 hours, excluding bot and revert edits
  • Measure non-reverted content edits within 48 hours, excluding bot and revert edits
  • Update weekly report with newly defined metrics if needed

Event Timeline

jwang triaged this task as Medium priority.Mar 26 2021, 9:36 PM
jwang created this task.

We excluded the revert edits in order to measure the net edits by human users, excluding any edits created due to vandalism. In summary, the YoY trend of net edits is not clear. Net edits reduced in some months but also grew in some months. Net content edits decreased every month. Please find the details below.

  • Measure non-reverted edits within 48 hours, excluding bot and revert edits

Metric definition: Number of monthly edits which were made by non-bot editors, were not reverted within 48 hours, and were not revert edits
SQL

WITH t1 AS
(
SELECT date_format(event_timestamp,'yyyy-MM-01') AS y_m,
 count(revision_id) AS non_reverted_edits
FROM wmf.mediawiki_history
WHERE snapshot= '{SNAPSHOT}' AND wiki_db='ptwiki'
-- for bots size(event_user_is_bot_by) returns 1/2, IP editors return -1, registered non-bot editors return 0
AND  size(event_user_is_bot_by) <= 0 
AND substr(event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
AND event_entity = "revision" AND (NOT revision_is_identity_reverted  OR revision_seconds_to_identity_revert > 172800)
GROUP BY date_format(event_timestamp,'yyyy-MM-01')
ORDER BY y_m
LIMIT 1000000
),
t2 AS
(
SELECT date_format(h1.event_timestamp,'yyyy-MM-01') AS y_m,
 count(distinct h1.revision_id) AS revert_edits
FROM wmf.mediawiki_history AS h1 
LEFT JOIN wmf.mediawiki_history AS h2  -- h1 is revert edits, h2 is the edits being reverted
     ON h1.revision_id=h2.revision_first_identity_reverting_revision_id  
      AND h1.wiki_db = h2.wiki_db
      AND h1.snapshot= h2.snapshot 
WHERE h1.wiki_db = 'ptwiki' AND h1.snapshot='{SNAPSHOT}'
    -- for bots size(event_user_is_bot_by) returns 1 or 2, IP editors return -1, registered non-bot editors return 0
    AND size(h1.event_user_is_bot_by) <= 0 AND size(h2.event_user_is_bot_by) <= 0
    AND substr(h1.event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
    AND h1.event_entity = "revision" and h2.event_entity = "revision" 
    AND (NOT h1.revision_is_identity_reverted  OR h1.revision_seconds_to_identity_revert > 172800)
    AND h2.revision_is_identity_reverted AND h2.revision_seconds_to_identity_revert < 172800
    AND h1.revision_is_identity_revert
    AND h1.event_entity = 'revision' and  h2.event_entity = 'revision'
GROUP BY date_format(h1.event_timestamp,'yyyy-MM-01')
ORDER BY y_m
LIMIT 10000
)
SELECT t1.y_m, t1.non_reverted_edits - t2.revert_edits AS net_non_reverted_edits
FROM  t1
LEFT JOIN t2 ON t1.y_m=t2.y_m

image.png (676×1 px, 191 KB)

Month_nFY2019/20FY2020/21YoY%
72160792176430.7%
819656824571025.0%
91906422019395.9%
101834831835850.1%
11185600172090-7.3%
121711161795824.9%
1187122180340-3.6%
2176125165528-6.0%
3175591
4174153
5186408
6195112
  • Measure non-reverted content edits within 48 hours, excluding bot and revert edits

Metric definition: Number of monthly content edits which were made by non-bot editors, were not reverted within 48 hours, and were not revert edits

SQL

SELECT date_format(event_timestamp,'yyyy-MM-01') AS y_m,
    count(revision_id) AS non_reverted_edits
FROM wmf.mediawiki_history
WHERE snapshot= '{SNAPSHOT}' AND wiki_db='ptwiki'
    AND page_namespace_is_content
    -- for bots size(event_user_is_bot_by) returns 1/2, IP editors return -1, registered non-bot editors return 0
    AND  size(event_user_is_bot_by) <= 0 
    AND substr(event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
    AND event_entity = "revision" AND (NOT revision_is_identity_reverted  OR revision_seconds_to_identity_revert > 172800)
GROUP BY date_format(event_timestamp,'yyyy-MM-01')
ORDER BY y_m
LIMIT 1000000
),
t2 AS
(
SELECT date_format(h1.event_timestamp,'yyyy-MM-01') AS y_m,
    count(distinct h1.revision_id) AS revert_edits
FROM wmf.mediawiki_history AS h1 
LEFT JOIN wmf.mediawiki_history AS h2  -- h1 is revert edits, h2 is the edits being reverted
    ON h1.revision_id=h2.revision_first_identity_reverting_revision_id  
    AND h1.wiki_db = h2.wiki_db
    AND h1.snapshot= h2.snapshot 
WHERE h1.wiki_db = 'ptwiki' AND h1.snapshot='{SNAPSHOT}'
    AND h1.page_namespace_is_content
    -- for bots size(event_user_is_bot_by) returns 1 or 2, IP editors return -1, registered non-bot editors return 0
    AND size(h1.event_user_is_bot_by) <= 0 AND size(h2.event_user_is_bot_by) <= 0
    AND substr(h1.event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
    AND h1.event_entity = "revision" and h2.event_entity = "revision" 
    AND (NOT h1.revision_is_identity_reverted  OR h1.revision_seconds_to_identity_revert > 172800)
    AND h2.revision_is_identity_reverted AND h2.revision_seconds_to_identity_revert < 172800
    AND h1.revision_is_identity_revert
    AND h1.event_entity = 'revision' and  h2.event_entity = 'revision'
GROUP BY date_format(h1.event_timestamp,'yyyy-MM-01')
ORDER BY y_m
LIMIT 10000
)
SELECT t1.y_m, t1.non_reverted_edits - t2.revert_edits AS net_non_reverted_edits
FROM  t1
LEFT JOIN t2 ON t1.y_m=t2.y_m

image.png (644×1 px, 184 KB)

Month_nFY2019/20FY2020/21YoY%
7166878162227-2.8%
814897018969027.3%
91473361483070.7%
10134386132905-1.1%
11134748127174-5.6%
12127077126118-0.8%
1142205121308-14.7%
2130357110974-14.9%
3127405
4122845
5133713
6137398

Looking at all wikipedias projects, non-reverted edits (excluding bot and revert edits) increased significantly in the last year. For 20/21 Q2, the monthly average increased 1.3M (13.5%) YoY.

image.png (694×1 px, 194 KB)

Have enabled metric of net non-reverted edits in weekly dashboard.