In PSI's email confirmation banner experiment, there was considerable SRM, especially on specific properties. Some wikipedias had approximately equal splits between treatment and control, but other properties had very imbalanced splits:
This begs the question: why? Is something different happening on those properties? We don't usually run A/A tests on those pages, so to my knowledge, we haven't noticed if there are any structural imbalances there.
Acceptance Criteria
- Run an A/A test with logged in users on Commons, WikiData, MediaWiki and MetaWiki to check for an imbalance
