Page MenuHomePhabricator

Measure impact of requiring login to edit articles on Persian Wikipedia
Closed, ResolvedPublic

Description

Motivation:
Persian Wikipedia (fawiki) is implementing a change that would disallow IP editors from editing the main namespace (implemented via T291018). We should capture metrics to assess the impact of this change on the project's health, similar to ptwiki metrics.

Relevant links:

Proposed metrics and measurements:
A series of metrics were proposed by the fawiki community. Some have been implemented as SQL queries by @Jeeputer and the queries and latest results can be found on this page. Also note that some of the metric definitions may need to mature more e.g. be time dependent (e.g. "within X days") and some include definitions (e.g. "newly registered users") that have to be carefully crafted into SQL queries using a mixture of features such as registration date, number of prior edits until a specific edit, etc.

  • Proportion of all IP edits that have been patrolled within X days
  • Number of edits by newly registered users
  • Total edits per namespace per day/week
    • The restriction only applies to the main namespace, therefore, edits in Talk namespace are expected to increase
    • If the restriction has a deterrent effect, total edits across all namespaces may decline

A series of metrics were used in the past experience on ptwiki; these were all trended overtime with a monthly bandwidth:

  • Number of new accounts
  • Number of active user editors
  • Retention rate
  • Number of edits
  • Number of reverts
  • Number of net non-reverted edits
  • Number of net non-reverted content edits
  • Number of blocks
  • Number of protected pages
  • Number of CheckUser requests

Qualitative measurements were informally done in the ptwiki experience. It may be worthwhile to solicit feedback from the users more formally, both during and after the temporary restriction.

Done

Weekly dashboard

The dashboard of fawiki metrics: https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard.html . It will be refreshed weekly. T292781#7517279.

Summary by December 2021

Persian (Farsi) Wikipedia community started blocking IP editing on content pages since October 20,2021, which is the Wednesday of 42th week in 2021. In the below week over week comparison, to minimize seasonality impact, we choose the 49th week (2021-12-06 ~ 2021-12-12) , which does not include any holidays. We do not compare the trend year over year, because too many factors are impacting user behavior. One of the factors is pandemic, which impacted user behavior in the last two years and is still having the impact.
Here are observations 8 weeks after the change, i.e. 49th Week of 2021 compared to 41st week. After turning off IP editing on article page on fawiki, we saw:

  • No significant increase in active registered editors, 1735 active registered editors in week 41, 1768 in week 49.
  • No significant increase in new accounts, from 1778 in week 41 to 1785 in week 49. Turning off IP editing on article pages did not lead to a significant increase in new accounts.
  • A 27% decrease in total edits, from 55753 in week 41 to 40505 in week 49
  • A 45% decrease in reverts, from 5555 in week 41 to 3031 in week 49. The decrease is mainly from reverts on article content pages.
  • A 18% decrease in net non-reverted edits excluding bot edits, from 35796 in week 41 to 29085 in week 49
  • A 51% decrease in blocks From 543 in week 41 to 268 in week 49
  • No obvious changes in Retention rate, Need more time to observe.
  • No obvious trend on checkusers
  • A 72% decrease in protected pages, from 157 in week 39 (week 41 is an abnormal week) to 44 in week 49. The decrease is mainly from protection on article content pages.

Please see 2021 report for details.

Provide measures on content page and talk page separately

T297653 Have enabled in weekly dashboard.

Report to community

https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/IP_Editing_Restriction_Study/Farsi_Wikipedia

Event Timeline

Niharika triaged this task as Medium priority.Oct 7 2021, 7:44 PM
Niharika created this task.

@Niharika, just to be clear, the IP editing ban has not been implemented yet. We are hoping that you, as a member of the backport deploy team, help us implement this the right way (i.e. through proper MediaWiki means, not JavaScript hacks or abuse filters). For example, please see T291018#7363246 where a volunteer developer says frankly that he is "unwilling to volunteer" his "time to help".

@4nn1l2 thank you clarifying that -- I was mistakenly under the impression that the ban had already been enacted. I am no longer an active member of the backport deploy team since I switched roles. However I can reach out to others to see if we can get more eyes on that. I will keep you posted.

Huji renamed this task from Measure impact of requiring login to edit on Farsi Wikipedia to Measure impact of requiring login to edit articles on Farsi Wikipedia.Oct 11 2021, 3:12 PM
Huji updated the task description. (Show Details)
Huji subscribed.
Huji added a subscriber: Jeeputer.

@4nn1l2 I have pinged Tim Starling and he has volunteered to help with the backport and deploy. I see that @Huji has listed this for deploy tomorrow. If it doesn't get deployed then Tim can step up.

Huji updated the task description. (Show Details)
Huji updated the task description. (Show Details)

Okay, let's ping @tstarling.

Tim, could you please help us implement this?

The dashboard of fawiki metrics has now been set up. It will be refreshed every week and published report at https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard.html .

So far the dashboard covers below metrics:

  • Number of active editors
  • Number of edits
  • Number of reverts
  • Number of net non-reverted edits
  • Number of edits by non-bot vs bot registered users
  • Number of blocks
  • Number of accounts created
  • Retention rate
  • Number of Checkuser checks
  • Number of pages protected
  • Quality of edits with ORES

Plan to add some other suggested metrics after explore the feasibility.

@jwang and @Niharika as I was looking at the dashboard (now at https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard_last.html) I noticed these patterns so far, approximately 2 months into the IP edit ban in the article namespace:

  • No noticeable increase in number of registered users.
  • Noticeable decrease in the total edit count
  • Noticeable decrease in the total number of revert
    • Why is this not refreshed since last Oct?
    • The trend seems to start 1-2 weeks before the ban started, so not sure if causally related.
  • Non-reverted edit count metrics are also not refreshed throughout November/December.
  • The total edit count line for 2019 (green dotted line) looks very jumpy. I wonder if this is because it includes edits by a currently non-bot but formerly bot account. Can you check if those spike in Sep-Dec 2019 are mainly from one account?
  • Noticeable decrease in number of blocks.
    • Number of blocks from 2019 is causing the Y axis to go to 12K. I am guessing we imported a lot of IP blocks in April 2019 and again in July 2019. This might be my own bot even (HujiBot). Can we exclude bot-issued edits in this metric altogether? Both of our bot admins are still bots and admins (HujiBot and Dexbot).
    • This metric is updated until early November but then not refreshed. Please check.
  • Retention rate metric is not updated for Sep (given that we are in Dec and this becomes available after 2 months, I would expect to see numbers for Oct even).
  • No noticeable change in CU volume.
  • For the number of pages protected metric, it is the 2021 pre-IP-ban data which causes problem with the Y axis. Again, I am guessing this was a bot related thing (without having checked, I would guess some admin bot went ahead and protected a lot of highly used templates). Can we exclude bot admins from this data?
  • Noticeable decrease in overall number of edits deemed as damaging by ORES (this is good).
  • Mild increase in number of edits by registered users deemed as damaging by ORES (this is bad, but given the above, still good).

Finally, can I ask these metrics to also be calculated and presented only for the main namespace? After all, the fawiki temporary IP ban is only in this namespace.

And can I ask for a new metric: number of edits in the "Talk" namespace? As IP editors cannot edit the articles themselves but can edit their talk pages, and since our MediaWiki:Namespaceprotected has been edited to specifically encourage IP editors to edit in article talk pages instead, we expect an increase in this metric.

@Huji Thanks for your questions. I created 2 sub tickets to track the followup work: T297655 , T297653. Some schemas are snapshot on a monthly basis. The metrics extracted from those schemas have one month lag. Retention rate has two months lag due to the nature of the metric definition.

I agree with most of your summary, expect those on ORES.

Noticeable decrease in overall number of edits deemed as damaging by ORES (this is good).
Mild increase in number of edits by registered users deemed as damaging by ORES (this is bad, but given the above, still good).

As I noted on weekly dashboard, we can't draw any conclusion on edits quality, because ORES model gives high damage rate on newcomers and IP editors. We keep them in dashboard purely FYI, given so many people are interested in it.

Huji renamed this task from Measure impact of requiring login to edit articles on Farsi Wikipedia to Measure impact of requiring login to edit articles on Persian Wikipedia.Dec 14 2021, 1:25 AM
Huji updated the task description. (Show Details)

@jwang I was looking at this dashboard again and noticed that retention numbers for early October are now shown. I think the issue with retention rate graph is its dots correspond to the account creation date, as opposed to the date of retention calculation.

In other words, if the fawiki intervention negatively impacts user retention, we would see a decline as of September (30 months before), which is earlier than of the date of intervention in mid-October (and in the latest update, there seems to be a decline in retention for early October indeed). Do I understand this correctly? If yes, can we somehow modify the graph to better indicate when the "effective date" of potential impact of retention would be on this visualization?

Summary by December 2021

Persian (Farsi) Wikipedia community started blocking IP editing on content pages since October 20,2021, which is the Wednesday of 42th week in 2021. In the below week over week comparison, to minimize seasonality impact, we choose the 49th week (2021-12-06 ~ 2021-12-12) , which does not include any holidays. We do not compare the trend year over year, because too many factors are impacting user behavior. One of the factors is pandemic, which impacted user behavior in the last two years and is still having the impact.
Here are observations 8 weeks after the change, i.e. 49th Week of 2021 compared to 41st week. After turning off IP editing on article page on fawiki, we saw:

  • No significant increase in active registered editors, 1735 active registered editors in week 41, 1768 in week 49.
  • No significant increase in new accounts, from 1778 in week 41 to 1785 in week 49. Turning off IP editing on article pages did not lead to a significant increase in new accounts.
  • A 27% decrease in total edits, from 55753 in week 41 to 40505 in week 49
  • A 45% decrease in reverts, from 5555 in week 41 to 3031 in week 49. The decrease is mainly from reverts on article content pages.
  • A 18% decrease in net non-reverted edits excluding bot edits, from 35796 in week 41 to 29085 in week 49
  • A 51% decrease in blocks From 543 in week 41 to 268 in week 49
  • No obvious changes in Retention rate, Need more time to observe.
  • No obvious trend on checkusers
  • A 72% decrease in protected pages, from 157 in week 39 (week 41 is an abnormal week) to 44 in week 49. The decrease is mainly from protection on article content pages.

Please see 2021 report for details.

@Huji, the graph is correct by its definition. The retention rate definition: out of the non-bot users who registered in the week before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days. The graph shows the retention rate of each user cohort. As of now (2021-01-06), we are only able to calculate the retention rate of the users who created account in week 44. The number beyond 44th week is an underestimation of retention rate because the recent cohorts of users are still in 30+30 days editing window.

image.png (860×2 px, 247 KB)

image.png (756×1 px, 214 KB)

@jwang and @Niharika are the graphs on https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard_last.html fully up to date? I see many whose data ends in late October; given that we are halfway through March, I find that a bit too old, no?

Since the 6-month period will end in about a month, I want to send another ping here to @jwang and @Niharika and ask for the data in those graphs to be updated.

The report is available at https://analytics.wikimedia.org/published/notebooks/AHT/fawiki_dashboard.html now. The auto report was broken ( T297734), and needs to refresh manually.

We also updated the baseline as we moved onto 2022. We kept 2021 and 2019 as the baselines. We kept 2019 instead of 2020 because 2019 is the baseline without the impact of pandemic kicking in. It makes 2019 an important reference.

The six-month limit starts on October 20, 2021 and must end on April 21, 2022. Continuation of this restriction requires the consensus of Persian Wikipedia users; Please remove the restriction on the due date.

Also, the effects of this restriction are not only statistically measurable; This change redefines the philosophy of Wikipedia. Wikipedia is a free encyclopedia.

In general, this restriction may only reduce the amount of vandalism, but in the long run it will have a devastating effect on Wikipedia. You should check for feedback on this restriction between the Wikipedia community and society

Today is April 21, 2022, and this restriction should have been lifted, but it still exists!

Change 784718 had a related patch set uploaded (by Huji; author: Huji):

[operations/mediawiki-config@master] Re-enable article editing by anonymous users on fawiki

https://gerrit.wikimedia.org/r/784718

I created the patch that would revert the config change. I have no availability today personally, but I am hoping that @tstarling or @Urbanecm would be able to deploy this without me being online. It is a revert change after all.

Change 784718 merged by jenkins-bot:

[operations/mediawiki-config@master] Re-enable article editing by anonymous users on fawiki

https://gerrit.wikimedia.org/r/784718

Mentioned in SAL (#wikimedia-operations) [2022-04-21T14:36:59Z] <ladsgroup@deploy1002> Synchronized wmf-config: Config: [[gerrit:784718|Re-enable article editing by anonymous users on fawiki (T292781)]] (duration: 00m 51s)

The lack of an uptick in new weekly editors on fawiki during the ip block indicates to me a big difference from ptwiki's implementation. My lack of farsi prevents a deeper investigation, but in ptwiki the interface was hacked to point logged out users at the point of edit. The edit button was changed to point directly to the log-in page and banner explaining the need to create an account linking to the reasoning was put there.

We did something similar at fawiki. If you go to MediaWiki:Namespaceprotected (where you have to view source, because it uses an {{#if:... command) you will see that we displayed a message if the logged-out user was hitting the namespace protection for namespace 0. Recall that one of the key differences between the ptwiki and fawiki initiatives was that fawiki only restricted the main (article) namespace.

That could also be one of the reasons for not seeing an uptick in new users; the logged-out editors could go to the talk page of the article and ask for a change. In fact, that is one of the recommendations that we offered them in MediaWiki:Namespaceprotected. But I say "could" because the trends don't show a sensible increase in logged-out edits in Talk namespace. It looks more like logged-out editors just gave up on editing altogether; they neither went to the talk page nor created an account so they can edit the article.

In wiki.pt, IPs can also go to the talk page and edit it, it was never restricted there. What was restricted was basically main and Wikipedia namespaces. And not all of Wikipedia, I seem to recall. They still have access to problem report and help request pages. However, with talk pages being one of the most non-functional features of mediawiki, and with recent surges of IP vandalism there, I'm not so sure they would still be allowed to edit there nowadays. Much pain and no gain.