This task encompasses the configuration for each demographics survey and the creation of the associated interface pages. Each configuration and interface page links will be added as a comment below as they are finalized. See T212444 for tracking of links to the configs for each language.
Based on 2017 survey sampling rates:
|Wikipedia Language||June 2017 PVs (in millions)||2017 Sampling Rate (1:X)||2017 Response count||May 2019 PVs (in millions)||delta May->June (2018)||2019 Desired # Responses||2019 Sampling Rate (1:X)||Actual # Responses|
|ar -- Arabic||147||10||2158||178||0.93||5000||4||10595|
|be -- Bengali||8.1||1||1198|
|de -- German||901||5||28000||974||0.96||5000||29||5118|
|en -- English (world)||7125||40||24140||7778||0.93||10000||98||~8000|
|en -- English (Africa)||165||0.94||5000||2||~10500|
|es -- Spanish||1071||5||39021||1164||0.86||10000||15||17733|
|fa -- Persian||147||0.97||1500||9||9214|
|fr -- French (world)||734||0.89||5000||12||~5900|
|fr -- French (Africa)||61||0.91||2000||2||~4200|
|he -- Hebrew||46.6||3||8848||62||0.92||1500||21||992|
|hi -- Hindi||30.5||2||3064|
|hu -- Hungarian||38.2||2.5||2455||55||0.93||1500||5||1725|
|ja -- Japanese||1056||5||19996|
|nl -- Dutch||145||8||3277|
|no -- Norwegian||36||0.94||1500||2||998|
|pt -- Portuguese||361||1.01||5000||6|
|ro -- Romanian||26.7||2||3829||44||0.83||1500||5||1873|
|ru -- Russian||843||5||67621||838||0.89||5000||59||6204|
|uk -- Ukrainian||37.8||2.5||8041||67||0.71||1500||12||1579|
|zh -- Chinese||367||20||5957||410||0.99||5000||26||3259|
- To correct for seasonal changes in page views between May (page view numbers are available) and June (when survey will run), the ratio of page views from May 2018 to June 2018 is included as the delta May -> June (2018) column. See T226273#5276481 for more details.
- Additionally, the sampling for Ukrainian/Romanian/Spanish is further rounded up to account for especially low July numbers (likely to affect the surveys as they are end of June).
- Coverage rates for surveys that were not run in 2017 are based on Japanese as a very conservative case as far as number of expected responses per page view (an order of magnitude lower than languages like Hebrew, Ukrainian) though only about half of most languages (English, German, Spanish).
- English total for both surveys was 18488 -- this can't be exactly split into the worldwide and Africa surveys due to it not being clear which survey was taken by readers under 18, but ~57% (~10500) of over-18 responses were from the African survey and ~43% were from the worldwide survey (~8000)
- Same for French: of 10161 total responses, ~58% or ~5900 responses were for worldwide and ~42% or ~4200 responses were from the African survey.
Desired number of responses fall into a few categories:
- at a minimum, we aim for 1500 responses in order to provide robust demographic results and some stratifications (e.g., age vs. gender). With 1500 responses, we will be able to debias but will likely not have enough responses to do subgroup discovery or other, more nuanced analyses. We do this for languages that are largely concentrated in a single country and are not high page-view languages (fa, he, hu, no, ro, uk).
- for languages with high page views and many countries (en, fr, es), we aim for 10,000 responses.
- for the surveys specific to Africa, we aim for as many page views as we can get while not sampling everyone.
- for all other languages, we aim for 2000-5000 responses as a balance between enough responses to do country-specific stratifications and not oversampling