The output here is an updated version of our table of strata definitions.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | nshahquinn-wmf | T178949 Support sampling and message delivery for the 2017–18 CE Insights survey | |||
| Resolved | nshahquinn-wmf | T185653 Make final decisions on sampling strategy | |||
| Resolved | nshahquinn-wmf | T188999 Determine appropriate buckets for annual edit count |
Event Timeline
This is mostly done, but there are still a few outstanding questions about our buckets for total annual edits (which I will do some analysis to help answer) and whether we will use "active on English Wikipedia" as a dimension in our stratification.
Oh, I also forgot that we need to decide how many users we want to sample from each of our sampling strata.
@egalvezwmf, @JAnstee_WMF, here are the counts for some of our dimensions for the November–January sampling frame (I can easily update it to the December–February frame once the Analytics Engineering team has finished updating the Data Lake for the new month).
Users in each of our 19 wiki groups:
arwiki 415 asia_wps 1851 cee_wps 3493 commons 2641 dewiki 3815 enwiki 18126 eswiki 2456 frwiki 3097 itwiki 1730 jawiki 3198 mena_wps 1621 nlwiki 734 other 1959 ptwiki 964 ruwiki 2340 ssa_wps 57 weur_wps 1784 wikidata 336 zhwiki 1879 TOTAL 52496
Users in each of our 6 edit bins (the min is 10 because of our sampling frame definition and the max is 1 195 264—wow!):
[10, 30) 2867 [30, 100) 10068 [100, 600) 18953 [600, 1000) 4529 [1000, 10000) 13511 [10000, 1200000) 2568 TOTAL 52496
I'm planning to make a graph showing the distribution of edits more intuitively (T188999).