Page MenuHomePhabricator

Positive Reinforcement: evaluate leading indicators of the new Impact module
Closed, ResolvedPublic


User story:
As the Growth team, I want to quickly determine if the new Positive Reinforcement features are having a positive impact on newcomers, because if we see any negative impact then we will want to revert changes and if we don't see any positive changes then we will need to make further improvements or consider sunsetting the features.

The Positive Reinforcement features aim to provide or improve the tools available to newcomers and mentors in three specific areas that will be described in more detail below. Our hypothesis is that once a newcomer has made a contribution (say by making a structured task edit), these features will help create a positive feedback cycle that increases newcomer motivation.

Below are the specific hypotheses that we seek to validate across the newcomer population. We will also have hypotheses for each of the three sets of features that the team plans to develop. These hypotheses drive the specifics for what data we will collect and how we will analyse that data.

  • The Positive Reinforcement features increase our core metrics of retention and productivity.
  • Since the Positive Reinforcement features do not feature a call to action that asks newcomers to make edits, we will see no difference in our activation core metric.
  • Newcomers who get the Positive Reinforcement features are able to determine that making un-reverted edits is desirable, and we will see a decrease in the proportion of reverted edits.
  • The positive feedback cycle created by the Positive Reinforcement features will lead to a significantly higher proportion of "highly active" newcomers.
  • The Positive Reinforcement features increase the number of Daily Active Users of Suggested edits.
  • The average number of edit sessions during the newcomer period (first 15 days) increases.
  • "Personalized praise" will increase mentor’s proactive communication with their mentees, which will lead to increase in retention and productivity.

Experiment plan
Similarly as we have done for previous Growth team projects, we want to test our hypotheses through controlled experiments (also called "A/B tests"). This will allow us to establish a causal relationship (e.g. "The Leveling Up features cause an increase in retention of xx%"), and it will allow us to detect smaller effects than if we were to give it to everyone and analyze the effects pre/post deployment.

In this controlled experiment, a randomly selected half of users will get access to Positive Reinforcement features (the "treatment" group), and the other randomly selected half will instead get the current (September 2022) Growth feature experience (the "control" group). In previous experiments, the control group has not gotten access to the Growth features. The team has decided to move away from that (T320876), which means that the current set of features is the new baseline for a control group.

The Personalized Praise feature is focused on mentors. There is a limited number of mentors on every wiki, whereas when it comes to newcomers the number increases steadily every day as new users register on the wikis. While we could run experiments with the mentors, we are likely to run into two key challenges. First, the limited number of mentors could mean that the experiments would need to run for a long time. Second, and more importantly, mentors are well integrated into the community and communicate with each other, meaning they are likely to figure out if some have access to features that others do not. We will therefore give the Personalized Praise features to all mentors and examine activity and effects on newcomers pre/post deployment in order to understand the feature’s effectiveness.

In summary, this means we are looking to run two consecutive experiments with the Impact and Leveling up features, followed by a deployment of the Personalized Praise features to all mentors. These experiments will first run on the pilot wikis. We can extend this to additional wikis if we find a need to do that, but it would only happen after we have analyzed the leading indicators and found no concerns.

Each experiment will run for approximately one month, and for each experiment we will have an accompanying set of leading indicators that we will analyze two weeks after deployment. The list below shows what the planned experiments will be:

  • Impact: treatment group gets the updated Impact module.
  • Leveling up: treatment group gets both the updated Impact module and the Leveling up features.
  • Personalized praise: all mentors get the Personalized praise features.

Event Timeline

Impact: treatment group gets the updated Impact module.

To clarify: we intend for all users to receive the new impact module except for the treatment group, which is 50% of new user registrations from the time we start the experiment. I guess another way of phrasing it is:

  • control
    • who: all existing users
    • what: new impact module
  • experiment group
    • who: 50% of new registrations
    • what: old impact module

Does that align with what this task proposes?

Sounds like we might want to add more details to the measurements specifications to make that more clear.
@nettrom_WMF should correct me if I'm wrong, but my understanding is:

Not part of the experiment:

  • All existing users with the newcomer homepage enabled (they all get the new impact module, but we aren't including them in the experiment)

Control group:

  • 50% of new registrations who get the old impact module

Experiment group:

  • 50% of new registrations who get the new impact module

(@nettrom_WMF please feel free to update this task, I just wanted to make sure we had a task logged for evaluating leading indicators for the Impact Module, but perhaps I should split this into at least three tasks: Impact Module, Leveling Up, and Personalized Praise).

KStoller-WMF raised the priority of this task from Medium to High.Dec 10 2022, 7:42 PM

@nettrom_WMF - we've reviewed leading indicators together a few times, and I added a basic "Results" update to:
Please feel free to add further details there if you think it's needed.

Should I break this task into three for each of the main PR projects and consider the Impact Module leading indicator evaluation complete?

KStoller-WMF renamed this task from Positive Reinforcement: evaluate leading indicators to Positive Reinforcement: evaluate leading indicators of the new Impact module.Jan 22 2023, 3:55 AM

I've updated the results on-wiki ( with more detailed information about our findings so it's accessible to the community. The main conclusion, that we did not find specific concerns that required changes to the experiment, did not change. Closing this task as resolved.