Page MenuHomePhabricator

Make final decisions on sampling strategy
Closed, ResolvedPublic

Description

The output here is an updated version of our table of strata definitions.

Event Timeline

nshahquinn-wmf renamed this task from Create table of options for sampling frame to Make final decision on sampling strategy.Mar 6 2018, 1:20 PM
nshahquinn-wmf renamed this task from Make final decision on sampling strategy to Make final decisions on sampling strategy.

This is mostly done, but there are still a few outstanding questions about our buckets for total annual edits (which I will do some analysis to help answer) and whether we will use "active on English Wikipedia" as a dimension in our stratification.

Oh, I also forgot that we need to decide how many users we want to sample from each of our sampling strata.

@egalvezwmf, @JAnstee_WMF, here are the counts for some of our dimensions for the November–January sampling frame (I can easily update it to the December–February frame once the Analytics Engineering team has finished updating the Data Lake for the new month).

Users in each of our 19 wiki groups:

arwiki        415
asia_wps     1851
cee_wps      3493
commons      2641
dewiki       3815
enwiki      18126
eswiki       2456
frwiki       3097
itwiki       1730
jawiki       3198
mena_wps     1621
nlwiki        734
other        1959
ptwiki        964
ruwiki       2340
ssa_wps        57
weur_wps     1784
wikidata      336
zhwiki       1879
TOTAL       52496

Users in each of our 6 edit bins (the min is 10 because of our sampling frame definition and the max is 1 195 264—wow!):

[10, 30)             2867
[30, 100)           10068
[100, 600)          18953
[600, 1000)          4529
[1000, 10000)       13511
[10000, 1200000)     2568
TOTAL               52496

I'm planning to make a graph showing the distribution of edits more intuitively (T188999).

nshahquinn-wmf raised the priority of this task from High to Needs Triage.Mar 29 2018, 9:06 AM
nshahquinn-wmf moved this task from Blocked to Done on the Contributors-Analysis board.