Sample sizes seem to be our biggest challenge in A/B tests (both in leading up to the test and in the analysis afterwards) because of our reliance on frequentist methods, so we need to move forward with the Bayesian approach to A/B testing.
To that end, we need to add BCDA tools to the wmf package that we can use in the analysis of our most recent test as well as the test we're aiming to launch this week.
Note that this doesn't make sample size not a factor. Sample size will still be a factor, but we're shifting how it's related to both the analysis and the results.
With a frequentist approach — the Null Hypothesis Significance Testing (NHST) approach — not enough data means we lack the power to detect small effects that may be present, but too much data means that we're detecting tiny effects that aren't actually meaningful but we still call them statistically significant because a p-value is a function of the sample size more than anything else.
With a Bayesian approach, where the posterior distribution is proportional to the product of the likelihood (the data) and the prior (our opinion about the distribution of the parameter). The less data we have for the likelihood, the greater the contribution of the prior to the posterior — the more important our choice of prior becomes. But "less data" refers to <200 sample sizes, which should not be an issue for us in general, but still something to be aware of, particularly with tests with very restrictive sampling criteria — e.g. query needs quotes and to contain at least 8 words.