Page MenuHomePhabricator

Measure no-reverted edits by non-bot users on ptwiki
Closed, ResolvedPublic

Assigned To
Authored By
jwang
Feb 1 2021, 6:45 PM
Referenced Files
F34141081: image.png
Mar 6 2021, 9:58 PM
F34091369: image.png
Feb 3 2021, 8:01 PM
F34086605: image.png
Feb 3 2021, 1:54 AM
F34086607: image.png
Feb 3 2021, 1:54 AM
F34086611: image.png
Feb 3 2021, 1:54 AM
Tokens
"Like" token, awarded by Chicocvenancio.

Description

We would like to measure YoY of non-reverted edits excluding bot edits to answer @Chicocvenancio's question in https://phabricator.wikimedia.org/T264940

Deliveries:

  • Measure weekly trend of non-reverted edits, excluding bot edits
  • Measure weekly trend of non-reverted content edits, excluding bot edits
  • Measure monthly trend of non-reverted edits, excluding bot edits
  • Measure monthly trend of non-reverted content edits, excluding bot edits
  • Update weekly report with newly defined metrics

Summary:
Number of edits by bots is 6% of total edits in November 2020. Though on ptwiki bot edits usually are a small portion of total edits. But we'd like to exclude bot edits in our measurement because it did create a bump in history, for example in March 2020. And the YoY changes of bot edits could mislead our conclusion on the impact on human editors after turning off IP editing.

By the 48th week,
YoY weekly non-reverted edits, excluding bot edits: decreased 7%, from 44238 to 41251.
YoY monthly non-reverted edits, excluding bot edits: decreased 10.85% YoY, from 200486 to 178726.
YoY weekly non-reverted content edits excluding bot edits: decreased 3% YoY, from 32257 to 31342.
YoY monthly non-reverted content edits, excluding bot edits: decreased 10.67% YoY, from 148675 to 132815.

Event Timeline

jwang triaged this task as Medium priority.Feb 1 2021, 11:29 PM

When we monitor the impact of tuning off IP editing, we are especially interested in the impact on edits by human editors (anonymous editors and registered editors ). The fluctuation of edits by bots possibly mislead the Year over Year comparison. We'd like to see the YoY trend of edits without the bot edits.

Meanwhile, we are interested to see the trend of content edits and whether/how it is impacted.

  • Weekly edits excluding bots

By the 48th week, YoY decreased 7%, from 44238 to 41251.
Metric definition: Number of weekly edits which were made by non-bot editors and were not reverted within 48 hours

SQL

SELECT weekofyear(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss'))) AS week_n,
 count(revision_id) AS non_reverted_edits
FROM wmf.mediawiki_history
WHERE snapshot= '{SNAPSHOT}' AND wiki_db='ptwiki'
-- for bots size(event_user_is_bot_by) returns 1/2, IP editors return -1, registered non-bot editors return 0
AND  size(event_user_is_bot_by) <= 0 
AND substr(event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
AND event_entity = "revision" AND (NOT revision_is_identity_reverted  OR revision_seconds_to_identity_revert > 172800)
GROUP BY weekofyear(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss')))

image.png (672×1 px, 158 KB)

  • Weekly content edits excluding bots

By the 48th week, YoY decreased 3%, from 32257 to 31342.
Metric definition: Number of weekly content edits which were made by non-bot editors and were not reverted within 48 hours

SQL

SELECT weekofyear(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss'))) AS week_n,
 count(revision_id) AS non_reverted_edits
FROM wmf.mediawiki_history
WHERE snapshot= '{SNAPSHOT}' AND wiki_db='ptwiki'
-- only content page
AND page_namespace_is_content 
-- for bots size(event_user_is_bot_by) returns 1/2, IP editors return -1, registered non-bot editors return 0
AND size(event_user_is_bot_by) <= 0 
AND substr(event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
AND event_entity = "revision" AND (NOT revision_is_identity_reverted  OR revision_seconds_to_identity_revert > 172800)
GROUP BY weekofyear(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss')))

image.png (662×1 px, 150 KB)

  • Monthly edits excluding bots

I also checked monthly trend as holidays might not be in the same week every year. Monthly data can exclude the possible fluctuations introduced by bots.
The monthly YoY change of edits in November is -10.85%, from 200486 to 178726.
Metric definition: Number of monthly edits which were made by non-bot editors and were not reverted within 48 hours

SQL

SELECT month(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss'))) AS month_n,
 count(revision_id) AS non_reverted_edits
FROM wmf.mediawiki_history
WHERE snapshot= '{SNAPSHOT}' AND wiki_db='ptwiki'
-- for bots size(event_user_is_bot_by) returns 1/2, IP editors return -1, registered non-bot editors return 0
AND size(event_user_is_bot_by) <= 0 
AND substr(event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
AND event_entity = "revision" AND (NOT revision_is_identity_reverted  OR revision_seconds_to_identity_revert > 172800)
GROUP BY month(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss')))
ORDER BY month_n
LIMIT 10000

image.png (662×1 px, 123 KB)

month_nnon_reverted_edits 2019non_reverted_edits 2020YoY %
1204872199750-2.50%
21808391890194.52%
3204012189839-6.95%
4191750187226-2.36%
5224115201542-10.07%
6213080210613-1.16%
72295712325941.32%
821207026061822.89%
92050112166945.70%
10198821191123-3.87%
11200486178726-10.85%
121835321857091.19%
  • Monthly content edits excluding bots

The monthly YoY change of edits in November is -10.67%, from 148675 to 132815
Metric definition: Number of monthly content edits which were made by non-bot editors and were not reverted within 48 hours

SQL

SELECT month(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss'))) AS month_n,
 count(revision_id) AS non_reverted_edits
FROM wmf.mediawiki_history
WHERE snapshot= '{SNAPSHOT}' AND wiki_db='ptwiki'
AND page_namespace_is_content 
-- for bots size(event_user_is_bot_by) returns 1/2, IP editors return -1, registered non-bot editors return 0
AND size(event_user_is_bot_by) <= 0 
AND substr(event_timestamp,1,10) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}'
AND event_entity = "revision" AND (NOT revision_is_identity_reverted  OR revision_seconds_to_identity_revert > 172800)
GROUP BY month(from_unixtime(UNIX_timestamp(event_timestamp,'yyyy-MM-dd HH:mm:ss')))
ORDER BY month_n
LIMIT 10000

image.png (672×1 px, 127 KB)

month_nnon_reverted_edits 2019non_reverted_edits 2020YoY %
11519301538521.27%
21367011422084.03%
3152898140474-8.13%
4142752134886-5.51%
5178568147529-17.38%
6165034151718-8.07%
7179298176023-1.83%
816353820341124.38%
91608391618380.62%
10148871139386-6.37%
11148675132815-10.67%
12138122131250-4.98%

In summary, for all four explored definitions of edits, we see a YoY decrease.

Have enabled in weekly report.