User story & summary:
As a stakeholder of the WE1 objective and outcomes from the Revise Tone Structured Task experiment, I want to understand whether the early impact to constructive activation is "real," (that is, not due to some error in the instrumentation or experiment setup) so that I can make an informed decision about next actions.
Background & research:
This task is important because we need to validate the observed impact on constructive activation and take action if necessary.
Constructive activation rate appears to be negatively impacted by revise tone:
https://superset.wikimedia.org/superset/dashboard/p/D2EB53dNOyq/
- Constructive activation rate: -32.7% (p value: <0.001)
- Constructive activation rate (mobile web): -26.9% (p value: <0.001)
Control group and treatment group appear to be unbalanced, which could be a sign of enrollment/bucketing issues:
| assigned | subject_count |
| control | 4679 |
| treatment | 6711 |
Acceptance Criteria:
- Verification of findings surfaced in Slack thread
- Growth team & steering committee alignment on what actions to take with the experiment (if anything)


