One of the major questions around interpreting the first round of demographics surveys has been what readers are our results reflective of? That is, do the results best represent the population of readers who reads daily? Or the population who reads weekly? monthly?
Why might a longer survey give different results?
Reasons why a longer survey (one month) might change the results, especially with respect to gender:
- External surveys have shown that the gender distribution of Wikipedia readers can depend on whether you ask something like "Do you read daily?" or "Do you read weekly?"
- For several languages in the first round of surveys, a trend was seen where the further into the week that the survey went, the more balanced the proportion of men/women in the respondents.
- From debiasing, women were consistently less likely to respond to the surveys than men
- It is known that, especially on desktop, it is incredibly easy to miss the survey widget. For readers who visit Wikipedia once a week or less, this means that it would be incredibly unlikely for them to see the widget to respond.
We are always hesitant to survey readers unnecessarily because however small the survey widget, it does still have the potential to disrupt reading. One of the benefits of running a survey for longer is that we can drastically lower the sampling rate (e.g., in the realm of 1:100). This ensures that only a small proportion of readers will see this survey and therefore readers with many browsers or who clear cookies will still be unlikely to be resampled into the survey. Repeatedly resampling the same user is mainly problematic from the perspective of respecting readers as opposed to statistics / results.
We also will not run these surveys in all 13 languages but limit them to new languages and a few languages whose sampling rate was lower than 1 in 20 in the first round (to minimize overlap between surveys). In this way, we do not overburden every language community but also get some sense of whether any patterns we see are language specific. We hope that the results will provide some indication of how much the results of the other languages might shift with a longer survey period.
The exact same survey will be used to maximize the comparability of the June round of surveys and this round (though notably seasonality will make direct comparison impossible as it will likely have a large impact).
- English was selected due to its wide coverage and availability of survey data in America
- Polish was not part of the first round so they are being included this round
- Russian is included because it is a language that had a low sampling rate the first round (less likely for readers to have already seen the survey) and displayed the narrowing gender gap throughout the week.
Sampling rates are still more an art than a science. In this case, the survey is expected to go 4x longer than the June survey, but with more time to see and respond to the survey, we expect response rate will increase. This would argue for at least a 6-8x reduction in sampling rate. For both Russian and English, we also do not need as many responses as were gathered in the first round (ideally more like 1000-2000 responses), so we can further reduce to 12-16x.
|en -- English (world)||Yes||Yes||T232525#5507366||23-Sept-19||1 of 98||6181||1 of 1200|
|pl -- Polish||Yes||Yes||T232525#5516283||23-Sept-19||N/A||N/A||1 of 200|
|ru -- Russian||Yes||Yes||T232525#5512050||23-Sept-19||1 of 59||4565||1 of 720|
* This is the total number of responses after removing those under 18 and responses that could not be linked to EventLogging, so it's a lower number than others reported. But it's the number that matters when analyzing the gender of the reading population.