There are three main requirements for the buckets:
* When these buckets are combined with our project groups, the resulting bins must be large enough to minimize re-identification risk (we don't plan to release raw answers, but this is an additional safeguard).
* According to @JAnstee_WMF, the numbers of users per bin should follow a somewhat normal distribution.
* There should be bin boundaries at 30 and 600 edits to preserve comparability with last year's data.
There are two bucket proposals right now. One creates relatively even-sized bins (`e_binned_edits`), which prioritizes the first criterion. The other creates relatively normal-sized bins (`n_binned_edits`), which prioritizes the second.
== E bins ==
```
BIN EDITORS
[10, 30) 2792
[30, 150) 14299
[150, 600) 14578
[600, 1350) 6953
[1350, 3800) 6873
[3800, 1100000) 6734
```
== N bins ==
```
BIN EDITORS
[10, 30) 2792
[30, 100) 9971
[100, 600) 18906
[600, 6000) 16096
[6000, 12000) 2374
[12000, 1100000) 2090
```
Further comparison information is in [this notebook](https://github.com/wikimedia-research/Community-Engagement-Insights-sampling/blob/1325964bb60837415ffa1b3af03804ae7ad43752/population-analysis.ipynb).