The Article Summaries experiment will require ad spending in order to drive traffic to the experiment microsite. In order to determine how big a budget we need, we must a power analysis in order to estimate the required sample size based on the minimum effect size we want to detect.
Description
Related Objects
Event Timeline
The users who click the ads will be split 50/50 into control and experiment groups for this experiment. Our main metric will be users' median time spent on site, and we decided that the smallest change in it that we would consider meaningful is 20%.
I used the mobile web session length data in wmf.mediawiki_reading_depth to estimate the mean and variance of our experiment. Since the articles we have in this experiment are all normal-length articles, I removed the session length of articles whose length is over the 95 percentile of all the articles on English Wikipedia.
To estimate the sample size, I used G*Power 3.1 to calculate the sample size assuming a (1) a two-tailed t test, (2) a 0.05significance level.
Here's a table of the required sample size for this experiment:
| Power | N Users | N Each Group |
|---|---|---|
| 80% | 19,084 | 9,542 |
| 75% | 16,606 | 8,303 |
| 70% | 14,526 | 7,263 |
To obtain 80% statistical power (which is a reasonable default), we need at least 19,084 users to reach the microsite, which is 9,542 users in each group. If we are not able to have enough users, the power of the test will be lower as listed in the table.
(The 80% statistical power means an 80% chance of finding that summarizing articles makes people stay on our site longer when that is actually the case. )