Page MenuHomePhabricator

WE4.2.14a: Analyze IP reputation data and how it maps to on-wiki editing and account creation activity
Open, MediumPublic

Description

In T354597: Record IP reputation data for account creations and edits we created a schema and server-side instrumentation to record IP reputation data obtained via Spur.us / iPoid-Service, if it exists, for a given IP when edits and account creations occur.

In this task, we want to answer the following questions:

  1. Is there a correlation between specific attributes of IP reputation data and reverted edits?
    1. When productive edits occur from IPs with poor reputation, which IP signals are present or absent?
  2. Is there a correlation between specific attributes of IP reputation data and blocked accounts?
  3. Is there a correlation between negative IP reputation data and anonymous editing compared to logged-in editing?
  4. What percentage of reverted edits are associated with IP reputation data obtained from Spur / iPoid-Service ?
  5. What percentage of blocked (one-off, or permanently) accounts are associated with IP reputation data from Spur / iPoid-Service?
  6. In general, can we explain the mapping between IP signals and on-wiki editing by a reduced set of attributes?
  7. Which infrastructure, client, risk, platform, protocol, client behavior, service category, targeting type enums and tags are most problematic in terms of being associated to reverted/deleted edits on wiki and permanently blocked accounts? Which are least problematic? https://docs.spur.us/data-types?id=data-types

Notes

Reverted Edits

  • We are interested in content edits on English Wikipedia.
  • Use 48 hours as the cutoff for revert status.

Blocked Account

  • We are interested in accounts created in English Wikipedia, excluding auto account creations.
  • Blocks include both global blocks and local blocks in English Wikipedia.
  • Blocks will not break down by duration (permanent vs. temporary) since most blocked registered accounts (97%) in the sample have been issued a permanent block.
  • We are primarily interested in < 90 days old accounts, assuming that accounts with persistent bad faith activity older than that will have already been blocked. Plus our events data is only available for 90 days.

Event Timeline

kostajh changed the task status from Open to Stalled.Mar 15 2024, 2:30 PM

Stalled pending the creation of the schema and MW implementation in T354597: Record IP reputation data for account creations and edits.

XiaoXiao-WMF renamed this task from Analyze IP reputation data as correlated with editing and account creations to Analyze IP data and their relation to editing and account creations.Apr 11 2024, 6:27 PM
XiaoXiao-WMF updated the task description. (Show Details)
XiaoXiao-WMF added subscribers: Pablo, fkaelin.
kostajh changed the task status from Stalled to Open.Jun 25 2024, 10:43 AM

Unstalled, we are collecting data and can start analyzing whenever we are ready.

kostajh renamed this task from Analyze IP data and their relation to editing and account creations to Analyze IP reputation data and how it maps to on-wiki editing and account creation activity.Jul 12 2024, 7:51 AM
kostajh updated the task description. (Show Details)

Legal, Safety & Security Service Center approval is tracked internally in https://app.asana.com/0/1205532102906413/1207906512522795

mpopov triaged this task as Medium priority.
mpopov moved this task from Triage to Current Quarter on the Product-Analytics board.
mpopov subscribed.

Connie will be taking on questions 1-2 towards the end of January, with the intention to validate the Spur.us dataset before engaging with the other questions asked.

Update: This project is up next for Connie after she finishes T371141: Analyze impact of Magru data center on unique devices in South America

She's also working on Incident Reporting System reporting/dashboarding, to be captured soon as a WE 4.1 hypothesis.

kostajh renamed this task from Analyze IP reputation data and how it maps to on-wiki editing and account creation activity to WE4.2.14: Analyze IP reputation data and how it maps to on-wiki editing and account creation activity.May 8 2025, 11:37 AM
kostajh added a project: WE4.2 Anti-abuse.
kostajh renamed this task from WE4.2.14: Analyze IP reputation data and how it maps to on-wiki editing and account creation activity to WE4.2.14a: Analyze IP reputation data and how it maps to on-wiki editing and account creation activity.May 13 2025, 8:03 AM

@kostajh please finds the reports for ip reputation data analysis related to blocked account https://analytics.wikimedia.org/published/reports/ip_reputation_data/report_blocked_accounts.html and related to reverted edits https://analytics.wikimedia.org/published/reports/ip_reputation_data/report_reverted_edits.html here.

Please review and let me know if you have any questions. Thanks!

@kostajh please finds the reports for ip reputation data analysis related to blocked account https://analytics.wikimedia.org/published/reports/ip_reputation_data/report_blocked_accounts.html and related to reverted edits https://analytics.wikimedia.org/published/reports/ip_reputation_data/report_reverted_edits.html here.

Please review and let me know if you have any questions. Thanks!

Thank you, I will review and get back to you!

It sounds as though the reports found some variables in common that are correlated with blocked accounts or reverted edits. A couple of thoughts, with T354599 in mind:

  • Should we look into how accurately we could classify accounts/edits based on these variables in combination?
  • (Depending on the above) Is there a way for us to advise abuse filter creators to treat these variables as probabilistic information, e.g. to use with consequences related to flagging as opposed to denying actions?