In the parent task, we are discussing what rate limit would be appropriate for number of temporary account creations to allow per IP per hour.
Temporary accounts are created when a logged-out user makes their first edit on the wiki. The account will persist for that user's device as long as the user does not clear their browser cookies.
Currently, we allow:
- 6 regular account creations per hourday per IP (= 144 per day), via `$wgAccountCreationThrottle`
- 8 edits per minute per IP (= 480 per hour, or 11,520 per day) via `$wgRateLimits`
There is of course a big difference between allowing 1446 temporary account creations and 11,520 temporary account creations per day. And as noted in the parent task:
> Some IPs are shared by a large number of people, e.g. covering a large geographical area. Rate limiting could significantly harm the ability of people using these IPs to edit
It would therefore be very useful if we could analyze current IP editing data, and try to work out how many unique user agents appear for the same IP address. That could help give us a clearer picture of what a reasonable rate limit would be for temp account creations. The user agent is an imperfect proxy for this information, especially since {T242825}, so we likely also need to make use of [client hints](https://www.mediawiki.org/wiki/Extension:CheckUser/Client_Hints) data.
To summarize, we want to know: p75 and p99 values for distinct user agents that appear for a given IP address per day, with the following variations:
- only include unreverted edits
- only include reverted edits
- exclude from analysis any IPs known to #ipoid-service
- only include IPs known to #ipoid-service
- break down information by country–countries with fewer IP addresses will have more people editing from a smaller pool of IPs
- exclude/include obvious bots by looking at user agent data
- look at edit attempts (clicking the edit button), not just edits
It would especially be interesting to look at outliers for countries with fewer IP addresses, to make sure that we don't inadvertently shut out anonymous editing for users in those countries by having too restrictive of a rate limit in place.
We should be able to use `wmf_raw.mediawiki_private_cu_changes` for this analysis, along with [client hints data](https://www.mediawiki.org/wiki/Extension:CheckUser/Client_Hints).