This task involves the work of evaluating the collective impact of WE 1.2 hypotheses have had on constructive activation.
WE 1.2
Constructive Activation: Widespread deployment of interventions shown to collectively cause a 10% relative increase (y-o-y) on mobile web and a 25% relative increase (y-o-y) on iOS of newcomers who publish ≥1 constructive edit in the main namespace on a mobile device, as measured by controlled experiments. | source
Requirements
@MNeisler to draft
Open question(s)
- 1. Which "Approach" will we move forward with for evaluating success of WE 1.2 and why?
Approach #3:
On a per platform basis, we will calculate the proportion of interventions we deployed and evaluated through controlled experiments that met and/or exceeded the ≥10% (mobile web) and ≥25% (iOS) constructive activation targets we set at the outset of this year. In an effort to both incentivize teams to be bold while at the same time being supported if/when an intervention doesn't deliver the impact we intend, we'll consider ourselves as having been effective if >70% of the year's interventions meet or exceed the constructive activation improvement targets defined above.
Approaches
To evaluate the aggregate impact of the discrete interventions we've deployed throughout the 2024-2025 fiscal year, we're considering the following approaches...
- Approach #1: Sum impact of interventions measured through controlled experiments
- In this approach, we would sum the impacts each intervention was proven [via a controlled experiment] to have on constructive activation, on a per platform (iOS and mobile web) basis
- E.g. "Success" would mean the following: (SUM(Intervention #1, Intervention #2, Intervention #3)) ≥ 10% or 25% impact on constructive activation (on mobile web and iOS respectively).
- Approach #2: Average impact of interventions measured through controlled experiments
- In this approach, we would average the impacts each intervention was proven [via a controlled experiment] to have on constructive activation, on a per platform (iOS and mobile web) basis
- E.g. "Success" would mean the following: (AVERAGE(Intervention #1 impact, Intervention #2 impact, Intervention #3 impact)) ≥ 10% or 25% impact on constructive activation (on mobile web and iOS respectively).
- Approach #3: Count of interventions shown to impact constructive activation, measured through controlled experiments
- In this approach, we would count the interventions proven [via a controlled experiment] to move constructive activation by at least the targets we set for each platform (≥10% = mobile web; 25% = iOS)
- E.g. "Success" would mean the following: 100% of the interventions deployed were shown to cause a ≥10% or ≥25% impact on constructive activation on mobile web and iOS respectively
Background
In T375926#10300625 (November 2024), we converged on [i][ii] measuring the impact of WE 1.2 (2024-2025) by completing, "...a year-over-year comparison to measure the collective impact of these widely deployed interventions."
Revisiting this decision now, in April 2025, we're questioning whether this year-over-year measurement of the aggregate constructive activation rate (per platform) is viable. This question is prompted, in large part, by:
- Recognizing WE 1.2 interventions are not yet fully scaled to all newcomers at all wikis
- Appreciating that evaluating constructive activation using an impact analysis creates the potential for external factors (outside of our control) that could cause shifts in this metric.
- Accepting that running a controlled experiment to evaluate the collective impact of discrete interventions extends beyond the resources/capacity available to us
i. See discussion in Slack
ii. Also see Decision Log/WE 1.2 FY 24-25