There are two main components for the analysis of the reader surveys:
- Cleaning and debiasing the survey results -- i.e. converting responses to standardized features and re-weighting responses based on how representative of the broader reader population they are. This requires extensive work to clean the data, build features for each survey response, and then a relatively simple propensity modeling step.
- Analyzing the survey results -- cross-tabulation of results, analyzing the relationship between demographics and reader behavior, etc.
This aligns with stages 2 and 3 of the survey process: https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Code