Page MenuHomePhabricator

Measure impact of requiring login to edit articles on Persian Wikipedia
Open, MediumPublic

Description

Motivation:
Persian Wikipedia (fawiki) is implementing a change that would disallow IP editors from editing the main namespace. We should capture metrics to assess the impact of this change on the project's health, similar to ptwiki metrics.

Relevant links:

Proposed metrics and measurements:
A series of metrics were proposed by the fawiki community. Some have been implemented as SQL queries by @Jeeputer and the queries and latest results can be found on this page. Also note that some of the metric definitions may need to mature more e.g. be time dependent (e.g. "within X days") and some include definitions (e.g. "newly registered users") that have to be carefully crafted into SQL queries using a mixture of features such as registration date, number of prior edits until a specific edit, etc.

  • Proportion of all IP edits that have been patrolled within X days
  • Number of edits by newly registered users
  • Total edits per namespace per day/week
    • The restriction only applies to the main namespace, therefore, edits in Talk namespace are expected to increase
    • If the restriction has a deterrent effect, total edits across all namespaces may decline

A series of metrics were used in the past experience on ptwiki; these were all trended overtime with a monthly bandwidth:

  • Number of new accounts
  • Number of active user editors
  • Retention rate
  • Number of edits
  • Number of reverts
  • Number of net non-reverted edits
  • Number of net non-reverted content edits
  • Number of blocks
  • Number of protected pages
  • Number of CheckUser requests

Qualitative measurements were informally done in the ptwiki experience. It may be worthwhile to solicit feedback from the users more formally, both during and after the temporary restriction.

Done

Weekly dashboard

The dashboard of fawiki metrics: https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard.html . It will be refreshed weekly. T292781#7517279.

Summary by December 2021

Persian (Farsi) Wikipedia community started blocking IP editing on content pages since October 20,2021, which is the Wednesday of 42th week in 2021. In the below week over week comparison, to minimize seasonality impact, we choose the 49th week (2021-12-06 ~ 2021-12-12) , which does not include any holidays. We do not compare the trend year over year, because too many factors are impacting user behavior. One of the factors is pandemic, which impacted user behavior in the last two years and is still having the impact.
Here are observations 8 weeks after the change, i.e. 49th Week of 2021 compared to 41st week. After turning off IP editing on article page on fawiki, we saw:

  • No significant increase in active registered editors, 1735 active registered editors in week 41, 1768 in week 49.
  • No significant increase in new accounts, from 1778 in week 41 to 1785 in week 49. Turning off IP editing on article pages did not lead to a significant increase in new accounts.
  • A 27% decrease in total edits, from 55753 in week 41 to 40505 in week 49
  • A 45% decrease in reverts, from 5555 in week 41 to 3031 in week 49. The decrease is mainly from reverts on article content pages.
  • A 18% decrease in net non-reverted edits excluding bot edits, from 35796 in week 41 to 29085 in week 49
  • A 51% decrease in blocks From 543 in week 41 to 268 in week 49
  • No obvious changes in Retention rate, Need more time to observe.
  • No obvious trend on checkusers
  • A 72% decrease in protected pages, from 157 in week 39 (week 41 is an abnormal week) to 44 in week 49. The decrease is mainly from protection on article content pages.

Please see 2021 report for details.

Provide measures on content page and talk page separately

T297653 Have enabled in weekly dashboard.

Event Timeline

Niharika triaged this task as Medium priority.Oct 7 2021, 7:44 PM
Niharika created this task.

@Niharika, just to be clear, the IP editing ban has not been implemented yet. We are hoping that you, as a member of the backport deploy team, help us implement this the right way (i.e. through proper MediaWiki means, not JavaScript hacks or abuse filters). For example, please see T291018#7363246 where a volunteer developer says frankly that he is "unwilling to volunteer" his "time to help".

@4nn1l2 thank you clarifying that -- I was mistakenly under the impression that the ban had already been enacted. I am no longer an active member of the backport deploy team since I switched roles. However I can reach out to others to see if we can get more eyes on that. I will keep you posted.

Huji renamed this task from Measure impact of requiring login to edit on Farsi Wikipedia to Measure impact of requiring login to edit articles on Farsi Wikipedia.Oct 11 2021, 3:12 PM
Huji updated the task description. (Show Details)
Huji added a subscriber: Huji.
Huji added a subscriber: Jeeputer.

@4nn1l2 I have pinged Tim Starling and he has volunteered to help with the backport and deploy. I see that @Huji has listed this for deploy tomorrow. If it doesn't get deployed then Tim can step up.

Huji updated the task description. (Show Details)
Huji updated the task description. (Show Details)

Okay, let's ping @tstarling.

Tim, could you please help us implement this?

The dashboard of fawiki metrics has now been set up. It will be refreshed every week and published report at https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard.html .

So far the dashboard covers below metrics:

  • Number of active editors
  • Number of edits
  • Number of reverts
  • Number of net non-reverted edits
  • Number of edits by non-bot vs bot registered users
  • Number of blocks
  • Number of accounts created
  • Retention rate
  • Number of Checkuser checks
  • Number of pages protected
  • Quality of edits with ORES

Plan to add some other suggested metrics after explore the feasibility.

@jwang and @Niharika as I was looking at the dashboard (now at https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard_last.html) I noticed these patterns so far, approximately 2 months into the IP edit ban in the article namespace:

  • No noticeable increase in number of registered users.
  • Noticeable decrease in the total edit count
  • Noticeable decrease in the total number of revert
    • Why is this not refreshed since last Oct?
    • The trend seems to start 1-2 weeks before the ban started, so not sure if causally related.
  • Non-reverted edit count metrics are also not refreshed throughout November/December.
  • The total edit count line for 2019 (green dotted line) looks very jumpy. I wonder if this is because it includes edits by a currently non-bot but formerly bot account. Can you check if those spike in Sep-Dec 2019 are mainly from one account?
  • Noticeable decrease in number of blocks.
    • Number of blocks from 2019 is causing the Y axis to go to 12K. I am guessing we imported a lot of IP blocks in April 2019 and again in July 2019. This might be my own bot even (HujiBot). Can we exclude bot-issued edits in this metric altogether? Both of our bot admins are still bots and admins (HujiBot and Dexbot).
    • This metric is updated until early November but then not refreshed. Please check.
  • Retention rate metric is not updated for Sep (given that we are in Dec and this becomes available after 2 months, I would expect to see numbers for Oct even).
  • No noticeable change in CU volume.
  • For the number of pages protected metric, it is the 2021 pre-IP-ban data which causes problem with the Y axis. Again, I am guessing this was a bot related thing (without having checked, I would guess some admin bot went ahead and protected a lot of highly used templates). Can we exclude bot admins from this data?
  • Noticeable decrease in overall number of edits deemed as damaging by ORES (this is good).
  • Mild increase in number of edits by registered users deemed as damaging by ORES (this is bad, but given the above, still good).

Finally, can I ask these metrics to also be calculated and presented only for the main namespace? After all, the fawiki temporary IP ban is only in this namespace.

And can I ask for a new metric: number of edits in the "Talk" namespace? As IP editors cannot edit the articles themselves but can edit their talk pages, and since our MediaWiki:Namespaceprotected has been edited to specifically encourage IP editors to edit in article talk pages instead, we expect an increase in this metric.

@Huji Thanks for your questions. I created 2 sub tickets to track the followup work: T297655 , T297653. Some schemas are snapshot on a monthly basis. The metrics extracted from those schemas have one month lag. Retention rate has two months lag due to the nature of the metric definition.

I agree with most of your summary, expect those on ORES.

Noticeable decrease in overall number of edits deemed as damaging by ORES (this is good).
Mild increase in number of edits by registered users deemed as damaging by ORES (this is bad, but given the above, still good).

As I noted on weekly dashboard, we can't draw any conclusion on edits quality, because ORES model gives high damage rate on newcomers and IP editors. We keep them in dashboard purely FYI, given so many people are interested in it.

Huji renamed this task from Measure impact of requiring login to edit articles on Farsi Wikipedia to Measure impact of requiring login to edit articles on Persian Wikipedia.Dec 14 2021, 1:25 AM
Huji updated the task description. (Show Details)

@jwang I was looking at this dashboard again and noticed that retention numbers for early October are now shown. I think the issue with retention rate graph is its dots correspond to the account creation date, as opposed to the date of retention calculation.

In other words, if the fawiki intervention negatively impacts user retention, we would see a decline as of September (30 months before), which is earlier than of the date of intervention in mid-October (and in the latest update, there seems to be a decline in retention for early October indeed). Do I understand this correctly? If yes, can we somehow modify the graph to better indicate when the "effective date" of potential impact of retention would be on this visualization?

Summary by December 2021

Persian (Farsi) Wikipedia community started blocking IP editing on content pages since October 20,2021, which is the Wednesday of 42th week in 2021. In the below week over week comparison, to minimize seasonality impact, we choose the 49th week (2021-12-06 ~ 2021-12-12) , which does not include any holidays. We do not compare the trend year over year, because too many factors are impacting user behavior. One of the factors is pandemic, which impacted user behavior in the last two years and is still having the impact.
Here are observations 8 weeks after the change, i.e. 49th Week of 2021 compared to 41st week. After turning off IP editing on article page on fawiki, we saw:

  • No significant increase in active registered editors, 1735 active registered editors in week 41, 1768 in week 49.
  • No significant increase in new accounts, from 1778 in week 41 to 1785 in week 49. Turning off IP editing on article pages did not lead to a significant increase in new accounts.
  • A 27% decrease in total edits, from 55753 in week 41 to 40505 in week 49
  • A 45% decrease in reverts, from 5555 in week 41 to 3031 in week 49. The decrease is mainly from reverts on article content pages.
  • A 18% decrease in net non-reverted edits excluding bot edits, from 35796 in week 41 to 29085 in week 49
  • A 51% decrease in blocks From 543 in week 41 to 268 in week 49
  • No obvious changes in Retention rate, Need more time to observe.
  • No obvious trend on checkusers
  • A 72% decrease in protected pages, from 157 in week 39 (week 41 is an abnormal week) to 44 in week 49. The decrease is mainly from protection on article content pages.

Please see 2021 report for details.

@Huji, the graph is correct by its definition. The retention rate definition: out of the non-bot users who registered in the week before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days. The graph shows the retention rate of each user cohort. As of now (2021-01-06), we are only able to calculate the retention rate of the users who created account in week 44. The number beyond 44th week is an underestimation of retention rate because the recent cohorts of users are still in 30+30 days editing window.

image.png (860×2 px, 247 KB)

image.png (756×1 px, 214 KB)