Maniphest T206380

Personalized first day: experiments (Variation A)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MMiller_WMF
	Oct 6 2018, 1:15 AM

Description

While it may not be a priority to experiment on the content or presentation of the new survey form, we will likely want to experiment on the presence of the form, and whether it depresses or increases activation rates.

The first step here is to decide whether and what we'll be experimenting on with this feature. Then we can create additional tasks for designing and implementing the experiments.

This task has become about experiments relating to Variation A.

A separate task has been created for the experiments relating to Variation C: T210868

Related Objects
Search...

Status	Assigned	Task
Resolved	MMiller_WMF	T206365 [EPIC] Growth: Welcome survey
Resolved	nettrom_WMF	T206380 Personalized first day: experiments (Variation A)
Open	None	T248015 Welcome Survey: Document variant test findings

Event Timeline

MMiller_WMF created this task.Oct 6 2018, 1:15 AM

• WikiCoding awarded a token.Oct 6 2018, 1:17 AM

JTannerWMF moved this task from Inbox to Needs Discussion on the Growth-Team board.Oct 10 2018, 12:23 PM

MMiller_WMF updated the task description. (Show Details)Oct 10 2018, 11:25 PM

MMiller_WMF moved this task from Needs Discussion to FY 2019-20 on the Growth-Team board.Oct 12 2018, 6:56 PM

MMiller_WMF mentioned this in T206379: Personalized first day: instrument survey form.Oct 18 2018, 9:54 PM

@nettrom_WMF -- in discussions with the team, it sounds like the main thing we want to know in the near term is simply whether the additional form has an effect on activation rate (or other important new account holder behavior). Learning about differences in question wording, ordering, quantity, or UI is not a priority right now.

Could you spend a few minutes thinking this through from your perspective? Basically, I think we don't want to deploy this to all new editors, and then suspect it might be depressing activation, but not really know for sure. In that vein, if we randomize who receives the form, we could see which of them make edits -- a path that requires a month-long (or longer) experiment, according to your calculations.

Perhaps another way to look at it is whether the form causes users to just bounce from the site. In that case, we would see via "Understanding first day" that they do not have any more pageviews.

Anyway, what do you think? Do you see a simple/straightforward way to get at what we want?

MMiller_WMF assigned this task to nettrom_WMF.Oct 26 2018, 4:59 PM

MMiller_WMF added a project: Product-Analytics.

mpopov triaged this task as Medium priority.Oct 26 2018, 6:38 PM

mpopov moved this task from Triage to Doing on the Product-Analytics board.

The main question being asked here is whether the survey we are adding has a detrimental effect on user activity. I've discussed this with @MMiller_WMF, and also with the Product-Analytics team. Out of those meetings comes the following recommendations and questions:

What are our leading indicators?
Write down a set of potential outcomes, with a plan of action for each.
How much are we willing to sacrifice?
What do we know so far about the effects of the survey?

Regarding the first part: activation rate might not be the leading indicator we are looking for, instead we should consider measuring the proportion of new accounts that skip the survey, and perhaps also the proportion of new accounts that abandon the site upon encountering the survey. These might provide us with faster indications than what activation rate can.

That being said, we should figure out the potential outcomes from deploying the survey, and make a plan of action for each outcome. This goes together with question 3, we should know what our limit is, if there is one. Say we deploy it and find that 10% of users abandon the process (meaning they leave the site) after encountering the survey, are we willing to run with that for a month and then decide whether to take any action? If not, we should know that before deploying it, or in other words that the plan of action for "strong indications of 10% abandonment" is "we stop the survey".

Lastly, what do we know about the effects? The user testing that @RHo did suggests that the users consider this a low cost survey. To me that translates to a low risk of abandonment, and also a low probability of a user skipping the survey. We can use that information to make a more informed prior in any statistical analysis we might do to inform our decisions (but said statistical analysis requires an investment in analytics resources).

Another related topic is what the %-split for treatment (survey) and control should be in deployment. If the goal of this is to learn whether it has an effect on user activity, we might consider a 50/50 split. But, if the goal is to learn from new users, we want to deploy it to a higher proportion, maybe 80/20?

To sum up, the following action items comes out of this:

Decide on leading indicators.
Sketch a list of scenarios and what action we will take for each of them.
Decide how long we are willing to run this even though we might see indications of negative impact.
Decide on a %-split of survey/control users.

Hi @nettrom_WMF - I agree that looking at how many people complete vs skip the survey and then abandon the site is a good indicator. Other ideas that might be useful consider:

How many people who skip the survey to go directly to trying to edit a page? (Does not matter if they successfully complete the edit.) This may be a better indicator of whether people don't want to complete the survey than complete abandonment of the site.
How many people click on "Getting started with editing" links (Tutorial and Help Desk) from within the survey (either on the RHS panel or on the post-submission page) regardless of survey completion?
- What are the activation rates for those users?
Splits on Activation rates or edit attempts for those who completed the survey based on answers to Q1 & Q2... Would it be interesting to see whether those who originally created an account just to read or didn't know Wikipedia was editable end up trying to an edit after exposure to this survey?

MMiller_WMF moved this task from FY 2019-20 to Sprint 0 (Growth Team) on the Growth-Team board.Nov 7 2018, 8:30 PM

MMiller_WMF edited projects, added Growth-Team (Sprint 0 (Growth Team)); removed Growth-Team.

JTannerWMF moved this task from Incoming to Ready for Development on the Growth-Team (Sprint 0 (Growth Team)) board.Nov 8 2018, 3:28 PM

JTannerWMF moved this task from Ready for Development to In Progress on the Growth-Team (Sprint 0 (Growth Team)) board.Nov 14 2018, 6:47 PM

We wrote up our experiment plan and put it under our team pages on mw.org.

In summary, the primary goal is to measure how the survey affects editor activation rate to determine if it has a negative impact. We will accomplish that by deploying it for a month with an A/B test where 50% of new users see the survey, while the other 50% do not. Assignment to survey/control groups is done randomly, ref T206371.

We also proposed several leading indicators of negative effects in the experiment plan, together with specific courses of action.

Now that Variation C is almost ready, we should modify the written experiment plan to indicate what share of users should receive Variation A, Variation C, and no survey. That can be up to @nettrom_WMF and @SBisson.

We decided to create a separate task for experiments with Variation C: T210868

MMiller_WMF renamed this task from Personalized first day: experiments to Personalized first day: experiments (Variation A).Nov 30 2018, 6:26 PM

MMiller_WMF updated the task description. (Show Details)

MMiller_WMF updated the task description. (Show Details)Nov 30 2018, 6:31 PM

Aklapper removed a subscriber: Growth-Team.Dec 7 2018, 12:53 PM

We've completed our initial experiment and found no obvious detrimental effect from the survey. We've also run a second experiment against variation C, and found that Var A is preferable. Currently, we are running an experiment on Vietnamese Wikipedia with Var A and a control group, to learn more about the abandonment rate on that wiki (ref T216668 and T216669).

@nettrom_WMF -- do you think enough time has gone by that we can look at the abandonment rates? Even if we don't yet have statistical significance on the activation rate? I would like us to find out as soon as we can whether the survey seems to be causing the stark abandonment rate that the Var A vs. Var C experiment suggested there might be.

MMiller_WMF mentioned this in T218920: Welcome survey: turn off for Vietnamese Wikipedia.Mar 21 2019, 5:35 PM

I started this analysis on 2019-03-18, at which point we had 3,624 non-autocreated registrations since switching on the survey/control A/B test. Using the week of data prior to deployment I had earlier estimated the overall abandonment rate at 17.2%. A power analysis indicated that if the control group's abandonment rate equalled the estimate, we would be able to determine a significant change if the survey group's abandonment rate was outside the [13%,21%] range.

I calculated abandonment overall for each group, as well as split by whether the account was registered on the desktop or mobile site. Overall, the results are:

Group	Did not abandon	%	Did abandon	%
Control	1,489	84.3%	278	15.7%
Survey	1,178	63.4%	679	36.6%

Overall, the survey group has a significantly larger abandonment rate (i.e. the difference is outside the range indicated by our power analysis). However, this is driven by abandonment of registrations on the desktop site:

Desktop/mobile	Group	Did not abandon	%	Did abandon	%
Desktop	Control	1,062	82.7%	222	17.3%
Desktop	Survey	743	54.3%	625	45.7%
Mobile	Control	427	88.4%	56	11.6%
Mobile	Survey	435	89.0%	54	11.0%

The 0.6pp difference between the control and survey group on mobile is obviously not significant (it's a much smaller sample and the difference is much smaller than identified in our power analysis). In other words, the survey appears to have no significant effect on abandonment for users who registered on the mobile site.

The 28.4pp difference between the control and survey group on desktop is statistically significant (X^2=244.4, df=1, p << 0.001). We're discussing how to dig further into this to understand what's going on.

Leaving this in progress because the next step is to get these findings on mediawiki.org. This is not urgent.

nettrom_WMF mentioned this in T216668: Welcome survey: investigate Vietnamese abandonment rate.Apr 19 2019, 11:48 PM

MMiller_WMF mentioned this in T228117: Analyze welcome survey experiment in Arabic Wikipedia.Jul 15 2019, 11:23 PM

kzimmerman lowered the priority of this task from Medium to Low.Sep 11 2019, 9:48 PM

MMiller_WMF mentioned this in T233064: Analyze welcome survey experiment in Basque Wikipedia.Sep 17 2019, 1:25 AM

• mmodell edited projects, added Product-Analytics (Kanban); removed Product-Analytics.Oct 16 2019, 6:32 PM

• mmodell moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Oct 16 2019, 6:32 PM

kzimmerman moved this task from Kanban to Current Quarter on the Product-Analytics board.Feb 12 2020, 10:55 PM

kzimmerman edited projects, added Product-Analytics; removed Product-Analytics (Kanban).

Remaining documentation moved to a separate task, closing the analysis as resolved.

Personalized first day: experiments (Variation A)Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Personalized first day: experiments (Variation A)
Closed, ResolvedPublic
Actions

Related Objects
Search...