Page MenuHomePhabricator

Head-to-head A/B test language switcher alternative
Closed, ResolvedPublic

Description

As a product owner I would like to know whether the new language switcher overlay is more effective, so that users can enjoy it if it's more effective.

Acceptance criteria

  • A/B test in mobile web stable - 50% should get version A (old: simpler-overlay), 50% should get version B (new: structured-overlay)
  • Analysis by product owner

As indicated in T127212#2090175, the data with a 90/10 A/B split have been inconclusive. This task is to make the data more conclusive.

Event Timeline

Should T128214 be a blocker?

No.

My thinking is that while some users alternate back and forth between languages, it's not necessary to study that specific use case in order to understand the general effectiveness of the overlay.

OK, I just thought that 50-50% division may end up being something like 70-30% because the new overlay wouldn't load.

OK, I just thought that 50-50% division may end up being something like 70-30% because the new overlay wouldn't load.

Yeah, I'm glad you asked. It seemed that the basic ratios of eligibility for one or the other were about what we expected in production, so I wouldn't worry too much about it right now. The root cause of the inconclusive data seemed to be version B's number of events being a little too low to make reasonable guesses about average behavior.

Change 276172 had a related patch set uploaded (by Bmansurov):
Change LanguageOverlay bucket rates

https://gerrit.wikimedia.org/r/276172

Change 276172 merged by jenkins-bot:
Change LanguageOverlay bucket rates

https://gerrit.wikimedia.org/r/276172

The patch has been SWAT deployed. You can check it by visiting https://en.m.wikipedia.org/wiki/Book and pasting the following code to the console:

mw.config.get('wgMFExperiments').languageOverlay.buckets

I'm moving the card back to "To Do" for Adam to do the analysis.

I doubt a change in bucketing would cause the error to surface; more likely it would cause it be more obvious, though, if there's some sort of strange bug.

Update: as noted in T129369#2106572, the fix in T129369: Language overlay will not open on enwiki seems to have restored some order.

I want a little more data to arrive before we call this done, but it's looking pretty good from what I can tell.

Update: data continue to look good, want to run query with full day of UTC 12-March-2016 data as well, a Saturday.

Generally, the new overlay fares well, performing slightly better in aggregate in terms of clickthroughs over the observed window, 2016031002-2016031421. Marking as resolved.

overlay, aggregate, full window (2016031002-2016031421)

select t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket, count(*)
from
MobileWebLanguageSwitcher_15302503 t2
inner join
MobileWebLanguageSwitcher_15302503 t1 on t2.event_FunnelToken = t1.event_FunnelToken
inner join
MobileWebLanguageSwitcher_15302503 t0 on t1.event_FunnelToken = t0.event_FunnelToken
where
t2.timestamp > '2016031002' and t2.timestamp < '2016031421'and t2.event_event = 'languageListLoaded' and t2.event_MobileMode = 'stable'
and t1.timestamp > '2016031002' and t1.event_event = 'languageButtonTap'
and t0.timestamp > '2016031002'  and t0.event_event = 'pageLoaded' and t0.event_beaconCapable = 1
group by t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket;

event_languageOverlayVersion	event_languageButtonTappedBucket	count(*)
simpler-overlay	0 taps	329
simpler-overlay	1-4 taps	182
simpler-overlay	20+ taps	32
simpler-overlay	5-20 taps	102
simpler-overlay	unknown	2
structured-overlay	0 taps	340
structured-overlay	1-4 taps	207
structured-overlay	20+ taps	29
structured-overlay	5-20 taps	91
structured-overlay	unknown	3



completed funnels, aggregate, full window (2016031002-2016031421)

select t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket, t3.event_exitModal, t3.event_searchInputHasQuery, count(*)
from MobileWebLanguageSwitcher_15302503 t3
inner join
MobileWebLanguageSwitcher_15302503 t2 on t3.event_FunnelToken = t2.event_FunnelToken
inner join
MobileWebLanguageSwitcher_15302503 t1 on t2.event_FunnelToken = t1.event_FunnelToken
inner join
MobileWebLanguageSwitcher_15302503 t0 on t1.event_FunnelToken = t0.event_FunnelToken
where
t3.timestamp > '2016031002' and t3.event_event = 'exitModal' and t3.event_mobileMode = 'stable'
and t2.timestamp > '2016031002' and t2.timestamp < '2016031421' and t2.event_event = 'languageListLoaded'
and t1.timestamp > '2016031002' and t1.event_event = 'languageButtonTap'
and t0.timestamp > '2016031002' and t0.event_event = 'pageLoaded' and t0.event_beaconCapable = 1
group by t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket, t3.event_exitModal, t3.event_searchInputHasQuery;


event_languageOverlayVersion	event_languageButtonTappedBucket	event_exitModal	event_searchInputHasQuery	count(*)
simpler-overlay	0 taps	dismissed	0	24
simpler-overlay	0 taps	dismissed	1	7
simpler-overlay	0 taps	tapped-on-result	0	215  (217/329 = 66.0%)
simpler-overlay	0 taps	tapped-on-result	1	2
simpler-overlay	1-4 taps	dismissed	0	6
simpler-overlay	1-4 taps	dismissed	1	1
simpler-overlay	1-4 taps	tapped-on-result	0	135  (136/182 = 74.7%)
simpler-overlay	1-4 taps	tapped-on-result	1	1
simpler-overlay	20+ taps	dismissed	0	2
simpler-overlay	20+ taps	tapped-on-result	0	21  (21/32 = 65.6%)
simpler-overlay	5-20 taps	dismissed	0	1
simpler-overlay	5-20 taps	tapped-on-result	0	92  (92/102 = 90.2%)
simpler-overlay	unknown	tapped-on-result	0	1  (1/2 = 50.0%)
structured-overlay	0 taps	dismissed	0	16
structured-overlay	0 taps	dismissed	1	9
structured-overlay	0 taps	tapped-on-result	0	238  (239/340 = 70.3%)
structured-overlay	0 taps	tapped-on-result	1	1
structured-overlay	1-4 taps	dismissed	0	3
structured-overlay	1-4 taps	tapped-on-result	0	171  (174/207 = 84.1%)
structured-overlay	1-4 taps	tapped-on-result	1	3
structured-overlay	20+ taps	dismissed	0	2
structured-overlay	20+ taps	tapped-on-result	0	17  (17/29 = 58.6%)
structured-overlay	5-20 taps	dismissed	0	3
structured-overlay	5-20 taps	tapped-on-result	0	68  (68/91 =  74.7%)
structured-overlay	unknown	tapped-on-result	0	1  (1/3 = 33.3%)


overlay 22 hours of 20160310 (allowing a little time after deployment); full days of the 11th, 12, and 13th; 21 hours of the 14th (latest hours available)

select left(t2.timestamp, 8) ts, t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket, count(*)
from
MobileWebLanguageSwitcher_15302503 t2
inner join
MobileWebLanguageSwitcher_15302503 t1 on t2.event_FunnelToken = t1.event_FunnelToken
inner join
MobileWebLanguageSwitcher_15302503 t0 on t1.event_FunnelToken = t0.event_FunnelToken
where
t2.timestamp > '2016031002' and t2.timestamp < '2016031421'and t2.event_event = 'languageListLoaded' and t2.event_MobileMode = 'stable'
and t1.timestamp > '2016031002' and t1.event_event = 'languageButtonTap'
and t0.timestamp > '2016031002'  and t0.event_event = 'pageLoaded' and t0.event_beaconCapable = 1
group by ts, t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket;


ts	event_languageOverlayVersion	event_languageButtonTappedBucket	count(*)
20160310	simpler-overlay	0 taps	62
20160310	simpler-overlay	1-4 taps	39
20160310	simpler-overlay	20+ taps	3
20160310	simpler-overlay	5-20 taps	20
20160310	structured-overlay	0 taps	61
20160310	structured-overlay	1-4 taps	51
20160310	structured-overlay	20+ taps	10
20160310	structured-overlay	5-20 taps	7
20160310	structured-overlay	unknown	1

20160311	simpler-overlay	0 taps	68
20160311	simpler-overlay	1-4 taps	44
20160311	simpler-overlay	20+ taps	5
20160311	simpler-overlay	5-20 taps	30
20160311	structured-overlay	0 taps	63
20160311	structured-overlay	1-4 taps	37
20160311	structured-overlay	20+ taps	5
20160311	structured-overlay	5-20 taps	24
20160311	structured-overlay	unknown	1

20160312	simpler-overlay	0 taps	62
20160312	simpler-overlay	1-4 taps	34
20160312	simpler-overlay	20+ taps	8
20160312	simpler-overlay	5-20 taps	20
20160312	structured-overlay	0 taps	82
20160312	structured-overlay	1-4 taps	37
20160312	structured-overlay	20+ taps	3
20160312	structured-overlay	5-20 taps	16

20160313	simpler-overlay	0 taps	80
20160313	simpler-overlay	1-4 taps	40
20160313	simpler-overlay	20+ taps	10
20160313	simpler-overlay	5-20 taps	18
20160313	simpler-overlay	unknown	2
20160313	structured-overlay	0 taps	61
20160313	structured-overlay	1-4 taps	41
20160313	structured-overlay	20+ taps	6
20160313	structured-overlay	5-20 taps	23

20160314	simpler-overlay	0 taps	57
20160314	simpler-overlay	1-4 taps	25
20160314	simpler-overlay	20+ taps	6
20160314	simpler-overlay	5-20 taps	14
20160314	structured-overlay	0 taps	73
20160314	structured-overlay	1-4 taps	41
20160314	structured-overlay	20+ taps	5
20160314	structured-overlay	5-20 taps	21
20160314	structured-overlay	unknown	1



completed funnels, 22 hours of 20160310 (allowing a little time after deployment); full days of the 11th, 12, and 13th; 21 hours of the 14th (latest hours available)

select left(t2.timestamp, 8) ts, t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket, t3.event_exitModal, t3.event_searchInputHasQuery, count(*)
from MobileWebLanguageSwitcher_15302503 t3
inner join
MobileWebLanguageSwitcher_15302503 t2 on t3.event_FunnelToken = t2.event_FunnelToken
inner join
MobileWebLanguageSwitcher_15302503 t1 on t2.event_FunnelToken = t1.event_FunnelToken
inner join
MobileWebLanguageSwitcher_15302503 t0 on t1.event_FunnelToken = t0.event_FunnelToken
where
t3.timestamp > '2016031002' and t3.event_event = 'exitModal' and t3.event_mobileMode = 'stable'
and t2.timestamp > '2016031002' and t2.timestamp < '2016031421' and t2.event_event = 'languageListLoaded'
and t1.timestamp > '2016031002' and t1.event_event = 'languageButtonTap'
and t0.timestamp > '2016031002' and t0.event_event = 'pageLoaded' and t0.event_beaconCapable = 1
group by ts, t2.event_languageOverlayVersion, t1.event_languageButtonTappedBucket, t3.event_exitModal, t3.event_searchInputHasQuery;


ts	event_languageOverlayVersion	event_languageButtonTappedBucket	event_exitModal	event_searchInputHasQuery	count(*)
20160310	simpler-overlay	0 taps	dismissed	0	4
20160310	simpler-overlay	0 taps	dismissed	1	2
20160310	simpler-overlay	0 taps	tapped-on-result	0	40  (41/62 = 66.1%)
20160310	simpler-overlay	0 taps	tapped-on-result	1	1
20160310	simpler-overlay	1-4 taps	dismissed	0	1
20160310	simpler-overlay	1-4 taps	tapped-on-result	0	33  (33/39 = 84.6%)
20160310	simpler-overlay	1-4 taps	tapped-on-result	1	1
20160310	simpler-overlay	20+ taps	tapped-on-result	0	2  (2/3 = 66.7%)
20160310	simpler-overlay	5-20 taps	tapped-on-result	0	19  (19/20 = 95.0%)
20160310	structured-overlay	0 taps	dismissed	0	5
20160310	structured-overlay	0 taps	dismissed	1	1
20160310	structured-overlay	0 taps	tapped-on-result	0	42  (42/61 = 68.9%)
20160310	structured-overlay	1-4 taps	tapped-on-result	0	43  (43/51 = 84.3%)
20160310	structured-overlay	20+ taps	tapped-on-result	0	6  (6/10 = 60.0%)
20160310	structured-overlay	5-20 taps	tapped-on-result	0	5  (5/7 = 71.4%)
(0 out of 1 structured overlay in the unknown tap bucket)

20160311	simpler-overlay	0 taps	dismissed	0	3
20160311	simpler-overlay	0 taps	dismissed	1	2
20160311	simpler-overlay	0 taps	tapped-on-result	0	40  (41/68 = 60.3%)
20160311	simpler-overlay	0 taps	tapped-on-result	1	1
20160311	simpler-overlay	1-4 taps	dismissed	0	1
20160311	simpler-overlay	1-4 taps	tapped-on-result	0	33  (33/44 = 75.0%)
20160311	simpler-overlay	20+ taps	tapped-on-result	0	4  (4/5 = 80.0%)
20160311	simpler-overlay	5-20 taps	dismissed	0	1
20160311	simpler-overlay	5-20 taps	tapped-on-result	0	23  (23/30 = 76.7%)
20160311	structured-overlay	0 taps	dismissed	0	2
20160311	structured-overlay	0 taps	dismissed	1	5
20160311	structured-overlay	0 taps	tapped-on-result	0	37  (37/63 = 58.7%)
20160311	structured-overlay	1-4 taps	tapped-on-result	0	42  (42/37 = 113.5%)
20160311	structured-overlay	20+ taps	tapped-on-result	0	4 (4/5 = 80.0%)
20160311	structured-overlay	5-20 taps	dismissed	0	1
20160311	structured-overlay	5-20 taps	tapped-on-result	0	17 (17/24 = 70.8%)

20160312	simpler-overlay	0 taps	dismissed	0	7
20160312	simpler-overlay	0 taps	tapped-on-result	0	41  (41/62 = 66.1%)
20160312	simpler-overlay	1-4 taps	tapped-on-result	0	28  (28/34 = 82.4%)
20160312	simpler-overlay	20+ taps	dismissed	0	1
20160312	simpler-overlay	20+ taps	tapped-on-result	0	6  (6/8 = 75.0%)
20160312	simpler-overlay	5-20 taps	tapped-on-result	0	20  (20/20 = 100.0%)
20160312	structured-overlay	0 taps	dismissed	0	3
20160312	structured-overlay	0 taps	tapped-on-result	0	55  (56/82 = 68.3%)
20160312	structured-overlay	0 taps	tapped-on-result	1	1
20160312	structured-overlay	1-4 taps	dismissed	0	2
20160312	structured-overlay	1-4 taps	tapped-on-result	0	28  (28/37 = 75.7%)
20160312	structured-overlay	20+ taps	dismissed	0	1
20160312	structured-overlay	20+ taps	tapped-on-result	0	1  (1/6 = 16.7%)
20160312	structured-overlay	5-20 taps	tapped-on-result	0	11  (11/16 = 68.8%)

20160313	simpler-overlay	0 taps	dismissed	0	8
20160313	simpler-overlay	0 taps	dismissed	1	3
20160313	simpler-overlay	0 taps	tapped-on-result	0	56  (56/80 = 70%)
20160313	simpler-overlay	1-4 taps	dismissed	0	3
20160313	simpler-overlay	1-4 taps	tapped-on-result	0	30  (30/40 = 75%)
20160313	simpler-overlay	20+ taps	tapped-on-result	0	7  (7/10 = 70%)
20160313	simpler-overlay	5-20 taps	tapped-on-result	0	17  (17/18 = 94.4%)
20160313	simpler-overlay	unknown	tapped-on-result	0	1  (1/2 = 50.0%)
20160313	structured-overlay	0 taps	dismissed	0	2
20160313	structured-overlay	0 taps	dismissed	1	2
20160313	structured-overlay	0 taps	tapped-on-result	0	38  (38/61 = 62.3%)
20160313	structured-overlay	1-4 taps	tapped-on-result	0	31  (31/41 = 75.6%)
20160313	structured-overlay	20+ taps	tapped-on-result	0	4 (4/6 = 66.7%)
20160313	structured-overlay	5-20 taps	dismissed	0	2
20160313	structured-overlay	5-20 taps	tapped-on-result	0	18 (18/23 = 78.3%)

20160314	simpler-overlay	0 taps	dismissed	0	2
20160314	simpler-overlay	0 taps	tapped-on-result	0	38  (38/57 = 66.7%)
20160314	simpler-overlay	1-4 taps	dismissed	0	1
20160314	simpler-overlay	1-4 taps	dismissed	1	1
20160314	simpler-overlay	1-4 taps	tapped-on-result	0	11  (11/25 = 44.0%)
20160314	simpler-overlay	20+ taps	dismissed	0	1
20160314	simpler-overlay	20+ taps	tapped-on-result	0	2  (2/6 = 33.3%)
20160314	simpler-overlay	5-20 taps	tapped-on-result	0	13  (13/14 = 92.9%)
20160314	structured-overlay	0 taps	dismissed	0	4
20160314	structured-overlay	0 taps	dismissed	1	1
20160314	structured-overlay	0 taps	tapped-on-result	0	66  (66/73 = 90.4%)
20160314	structured-overlay	1-4 taps	dismissed	0	1
20160314	structured-overlay	1-4 taps	tapped-on-result	0	27  (30/41 = 73.2%)
20160314	structured-overlay	1-4 taps	tapped-on-result	1	3
20160314	structured-overlay	20+ taps	dismissed	0	1
20160314	structured-overlay	20+ taps	tapped-on-result	0	2  (2/5 = 40.0%)
20160314	structured-overlay	5-20 taps	tapped-on-result	0	17  (17/21 = 81.0%)
20160314	structured-overlay	unknown	tapped-on-result	0	1  (1/1 = 100.0%)
dr0ptp4kt updated the task description. (Show Details)