In T354597: Record IP reputation data for account creations and edits we created a schema and server-side instrumentation to record IP reputation data obtained via Spur.us / iPoid-Service, if it exists, for a given IP when edits and account creations occur.
In this task, we want to answer the following questions:
- Is there a correlation between specific attributes of IP reputation data and reverted edits?
- When productive edits occur from IPs with poor reputation, which IP signals are present or absent?
- Is there a correlation between specific attributes of IP reputation data and blocked accounts?
- Is there a correlation between negative IP reputation data and anonymous editing compared to logged-in editing?
- What percentage of reverted edits are associated with IP reputation data obtained from Spur / iPoid-Service ?
- What percentage of blocked (one-off, or permanently) accounts are associated with IP reputation data from Spur / iPoid-Service?
- In general, can we explain the mapping between IP signals and on-wiki editing by a reduced set of attributes?
- Which infrastructure, client, risk, platform, protocol, client behavior, service category, targeting type enums and tags are most problematic in terms of being associated to reverted/deleted edits on wiki and permanently blocked accounts? Which are least problematic? https://docs.spur.us/data-types?id=data-types
Notes
Reverted Edits
- We are interested in content edits on English Wikipedia.
- Use 48 hours as the cutoff for revert status.
Blocked Account
- We are interested in accounts created in English Wikipedia, excluding auto account creations.
- Blocks include both global blocks and local blocks in English Wikipedia.
- Blocks will not break down by duration (permanent vs. temporary) since most blocked registered accounts (97%) in the sample have been issued a permanent block.
- We are primarily interested in < 90 days old accounts, assuming that accounts with persistent bad faith activity older than that will have already been blocked. Plus our events data is only available for 90 days.