As the Growth team, I want to quickly determine if the new Positive Reinforcement features are having a positive impact on newcomers, because if we see any negative impact then we will want to revert changes and if we don't see any positive changes then we will need to make further improvements or consider sunsetting the features.
The Positive Reinforcement features aim to provide or improve the tools available to newcomers and mentors in three specific areas that will be described in more detail below. Our hypothesis is that once a newcomer has made a contribution (say by making a structured task edit), these features will help create a positive feedback cycle that increases newcomer motivation.
Below are the specific hypotheses that we seek to validate across the newcomer population. We will also have hypotheses for each of the three sets of features that the team plans to develop. These hypotheses drive the specifics for what data we will collect and how we will analyse that data.
- The Positive Reinforcement features increase our core metrics of retention and productivity.
- Since the Positive Reinforcement features do not feature a call to action that asks newcomers to make edits, we will see no difference in our activation core metric.
- Newcomers who get the Positive Reinforcement features are able to determine that making un-reverted edits is desirable, and we will see a decrease in the proportion of reverted edits.
- The positive feedback cycle created by the Positive Reinforcement features will lead to a significantly higher proportion of "highly active" newcomers.
- The Positive Reinforcement features increase the number of Daily Active Users of Suggested edits.
- The average number of edit sessions during the newcomer period (first 15 days) increases.
- "Personalized praise" will increase mentor’s proactive communication with their mentees, which will lead to increase in retention and productivity.
Similarly as we have done for previous Growth team projects, we want to test our hypotheses through controlled experiments (also called "A/B tests"). This will allow us to establish a causal relationship (e.g. "The Leveling Up features cause an increase in retention of xx%"), and it will allow us to detect smaller effects than if we were to give it to everyone and analyze the effects pre/post deployment.
In this controlled experiment, a randomly selected half of users will get access to Positive Reinforcement features (the "treatment" group), and the other randomly selected half will instead get the current (September 2022) Growth feature experience (the "control" group). In previous experiments, the control group has not gotten access to the Growth features. The team has decided to move away from that (T320876), which means that the current set of features is the new baseline for a control group.
The Personalized Praise feature is focused on mentors. There is a limited number of mentors on every wiki, whereas when it comes to newcomers the number increases steadily every day as new users register on the wikis. While we could run experiments with the mentors, we are likely to run into two key challenges. First, the limited number of mentors could mean that the experiments would need to run for a long time. Second, and more importantly, mentors are well integrated into the community and communicate with each other, meaning they are likely to figure out if some have access to features that others do not. We will therefore give the Personalized Praise features to all mentors and examine activity and effects on newcomers pre/post deployment in order to understand the feature’s effectiveness.
In summary, this means we are looking to run two consecutive experiments with the Impact and Leveling up features, followed by a deployment of the Personalized Praise features to all mentors. These experiments will first run on the pilot wikis. We can extend this to additional wikis if we find a need to do that, but it would only happen after we have analyzed the leading indicators and found no concerns.
Each experiment will run for approximately one month, and for each experiment we will have an accompanying set of leading indicators that we will analyze two weeks after deployment. The list below shows what the planned experiments will be:
- Impact: treatment group gets the updated Impact module.
- Leveling up: treatment group gets both the updated Impact module and the Leveling up features.
- Personalized praise: all mentors get the Personalized praise features.