In the parent task, we are discussing what rate limit would be appropriate for number of temporary account creations to allow per IP per hour.
Temporary accounts are created when a logged-out user makes their first edit on the wiki. The account will persist for that user's device as long as the user does not clear their browser cookies.
Currently, we allow:
- 6 regular account creations per day per IP via $wgAccountCreationThrottle
- 8 edits per minute per IP (= 480 per hour, or 11,520 per day) via $wgRateLimits
There is of course a big difference between allowing 6 temporary account creations and 11,520 temporary account creations per day. And as noted in the parent task:
Some IPs are shared by a large number of people, e.g. covering a large geographical area. Rate limiting could significantly harm the ability of people using these IPs to edit
It would therefore be very useful if we could analyze current IP editing data, and try to work out how many unique user agents appear for the same IP address. That could help give us a clearer picture of what a reasonable rate limit would be for temp account creations. The user agent is an imperfect proxy for this information, especially since T242825: Deal with Google Chrome User-Agent deprecation, so we likely also need to make use of client hints data.
To summarize, we want to know: distinct user agents that appear for a given IP address per day.
After we do that, perhaps we could add the following variations:
- only include unreverted edits
- only include reverted edits
- exclude from analysis any IPs known to iPoid-Service
- only include IPs known to iPoid-Service
- break down information by country–countries with fewer IP addresses will have more people editing from a smaller pool of IPs
- exclude/include obvious bots by looking at user agent data
- look at edit attempts (clicking the edit button), not just edits
It would especially be interesting to look at outliers for countries with fewer IP addresses, to make sure that we don't inadvertently shut out anonymous editing for users in those countries by having too restrictive of a rate limit in place.
We should be able to use wmf_raw.mediawiki_private_cu_changes for this analysis, along with client hints data.