Change Details

There are three main requirements for the buckets: * When these buckets are combined with our project groups, the resulting bins must be large enough to minimize re-identification risk (we don't plan to release raw answers, but this is an additional safeguard). * According to @JAnstee_WMF, the numbers of users per bin should follow a somewhat normal distribution. * There should be bin boundaries at 30 and 600 edits to preserve comparability with last year's data. There are two bucket proposals right now. One creates relatively even-sized bins (`e_binned_edits`), which prioritizes the first criterion. The other creates relatively normal-sized bins (`n_binned_edits`), which prioritizes the second. == E bins == ``` BIN EDITORS [10, 30) 2792 [30, 150) 14299 [150, 600) 14578 [600, 1350) 6953 [1350, 3800) 6873 [3800, 1100000) 6734 ``` == N bins == ``` BIN EDITORS [10, 30) 2792 [30, 100) 9971 [100, 600) 18906 [600, 6000) 16096 [6000, 12000) 2374 [12000, 1100000) 2090 ``` Further comparison information is in [this notebook](https://github.com/wikimedia-research/Community-Engagement-Insights-sampling/blob/1325964bb60837415ffa1b3af03804ae7ad43752/population-analysis.ipynb).