Welcome survey: store aggregates
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MMiller_WMF
	Oct 15 2019, 6:09 PM

Description

As described in the parent task, we can only keep welcome survey responses for one year. We want to store aggregates to help us study how newcomer intentions may change over long periods of time. It will be okay if this task is not accomplished until data has already started being deleted for a few weeks.

Perhaps we will want to store them at the monthly level per wiki.

Details

Due Date: Nov 29 2019, 8:00 AM

Related Objects
Search...

Status	Assigned	Task
Resolved	MMiller_WMF	T206365 [EPIC] Growth: Welcome survey
Resolved	Tgr	T208369 Welcome survey: anonymize data after one year
Resolved	nettrom_WMF	T235548 Welcome survey: store aggregates

Event Timeline

@nettrom_WMF -- I did not specify in here the exact data we will want to store, the timeframe, or the format. I think this can be up to you, but I'm happy to weigh in if needed.

MMiller_WMF mentioned this in T208369: Welcome survey: anonymize data after one year.Oct 15 2019, 6:13 PM

kzimmerman triaged this task as High priority.Oct 21 2019, 5:14 PM

kzimmerman edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

MMiller_WMF set Due Date to Nov 29 2019, 8:00 AM.Oct 29 2019, 3:19 PM

MMiller_WMF updated the task description. (Show Details)

nettrom_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Dec 2 2019, 5:36 PM

nettrom_WMF moved this task from Ready for Development to In Progress on the Growth-Team (Sprint 0 (Growth Team)) board.Dec 4 2019, 8:17 PM

nettrom_WMF moved this task from Doing to Next 2 weeks on the Product-Analytics (Kanban) board.Feb 19 2020, 8:32 PM

nettrom_WMF mentioned this in T246822: Analyze Welcome Survey responses on why editors join.Mar 3 2020, 8:17 PM

nettrom_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Mar 10 2020, 5:53 PM

Dzahn mentioned this in T249807: Weekly phabricator-reports mail: List open tasks with a Due Date in the past.Apr 10 2020, 1:08 PM

@MMiller_WMF : we have a working prototype for storing aggregates. I think the next step should be that we close this task and open a subtask for productionizing it in collaboration with Analytics Engineering.

nettrom_WMF moved this task from Doing to Needs Review on the Product-Analytics (Kanban) board.Apr 13 2020, 5:40 PM

@nettrom_WMF -- did we also include storing the aggregates on the topics? Also, is it important to productionize it, or should we stick with the prototype going forward?

@MMiller_WMF : I updated the topic aggregation code when I was working on T246822, making it so we have counts for every checkbox and autocomplete topic, with any user-entered topic counted as "other". While we had deleted some of the data when I ran the updated code, we have data from April to September of last year for the topics (for all other questions we have data from December 2018 onwards).

I'm okay with not productionizing it for now, and we can keep monitoring how much we need updated data on this, particularly as we expand to additional wikis.

@nettrom_WMF -- okay sounds good. Could you please leave a note on this task saying where the aggregates are stored so that we (probably I) can remember where to find them in the future? And then I think it can be resolved. Thank you!

nettrom_WMF moved this task from Needs Review to Doing on the Product-Analytics (Kanban) board.Apr 20 2020, 4:33 PM

The aggregates are stored in the growth_welcomesurvey database in the Data Lake.

There are five tables in that database, and all tables are split by month, wiki, and platform the user registered on (desktop/mobile). The names of the questions are taken from the saved JSON data, the available options for questions 1, 2, and 4 are also taken from the JSON data. Mapping the options to actual text can be done through inspecting the HTML in the form.

monthly_overview: Overview of user groups (e.g. control/survey for wikis where those were used), type of survey response (save/skip/abandon) if in a survey group, and number of users.
q1_responses: Responses to the first question on the survey (currently "Why did you create your account today?")
q2_responses: Responses to the second question on the survey (currently "Have you ever edited Wikipedia?")
q3_responses: Number of users who selected interest in a given topic, for all topics available as checkboxes and through autocomplete. Topic "other" counts the number of topics added through free text. This question is currently not in the survey, labelled as "q3" for historic reasons.
q4_responses: Responses to the last question on the survey (currently "Yes, I’m interested" as the answer to the following description and question "We are considering starting a program for more experienced editors to help newer users with editing. Are you interested in being contacted to get help with editing?") If users do not check the box (which is the default), the value is "False", otherwise it's "True".

Closing this task as resolved, but feel free to reopen it if necessary.

nettrom_WMF closed this task as Resolved.Apr 21 2020, 8:15 PM

@nettrom_WMF -- thanks. What will happen if we add, remove, or change questions on the survey in the future? Like if we were to swap question or ordering, or replace the "have you ever edited" question with one asking about languages? Would new columns appear in the data lake, or new data put in old columns? Or something else?

Urbanecm edited subscribers, added: Urbanecm_WMF; removed: Urbanecm.Aug 26 2020, 2:10 PM

MMiller_WMF mentioned this in T275172: Growth: update welcome survey aggregation schedule.Feb 18 2021, 11:23 PM

Welcome survey: store aggregatesClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Welcome survey: store aggregates
Closed, ResolvedPublic
Actions

Related Objects
Search...