Page MenuHomePhabricator

Run Hovercards A/B test 1
Closed, ResolvedPublic1 Story Points

Description

Before setting hovercards to default for all users (except those who use the navpop gadget) we should start with a 4 week a/b test so that we can directly measure the impact of hovercards on reader experience/behavior

  • A/B test enabled on hu.wikipedia.org to 1% of users. Schema:Popups should continue logging at some percentage for Hovercards-enabled and non-Hovercards-enabled alike.

Impl

  • Config change to enable the extension in stable in such wiki.

Event Timeline

Jdlrobson created this task.May 9 2016, 4:24 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 9 2016, 4:24 PM
dr0ptp4kt moved this task from Incoming to 2016-17 Q2 on the Readers-Web-Backlog board.
dr0ptp4kt updated the task description. (Show Details)May 16 2016, 3:42 PM
Jhernandez updated the task description. (Show Details)May 16 2016, 4:42 PM
dr0ptp4kt renamed this task from Run Hovercards A/B test to Run Hovercards A/B test 1.May 16 2016, 7:51 PM
Jhernandez triaged this task as High priority.May 25 2016, 4:33 PM
phuedx added a subscriber: phuedx.May 26 2016, 10:41 AM

A/B test enabled on <lang>.wikipedia.org. to <value> % of users. Schema:Popups should be logging at some percentage for Hovercards-enabled and non-Hovercards-enabled alike.

What are $lang and $value?

$lang : hu.wikipedia.org
$value of users to get Hovercards UX with previews turned on: 20%, which I think strikes an okay balance. @Tbayer, @jrobell, @DStrine does that work for you or should we set it higher and make it a head to head test at 50%?

Quick check on pageLoaded events at 20%...

Recently huwiki looks to be getting somewhere between about 0.475 and 1.1 million daily desktop pageviews. Use the breakdown option here for trends.

The event logging is confined to sendBeacon clients.

Looking at data from yesterday for desktop about 81.7% of pageviews in theory were highly likely to have sendBeacon capable UAs (Chrome 39+, Firefox 31+, Opera 26+). [1]

Let's suppose of those about 20% of eventing doesn't work because of DNT or because, as we've observed at points, sendBeacon events just don't make it for some reason.

Let's also suppose 85% of pageviews are in namespaces eligible for Hovercards.

Just rough figures, all the usual caveats would apply, especially given we do anticipate at least some differences in the style of engagement, and thus number of events per pageview with and without Hovercards:

On the lower end, 475,000 * 0.817 sendBeacon * 0.8 sendBeacon made it * 0.85 Hovercards eligible namespace * 0.2 users with Hovercards enabled * 0.1 session sampling rate (current global default as I understand) = 5,277 page load events with Hovercards on (and about 21,108 off) per day.

On the higher end let's just suppose 10% sample rate on 1.1 million pageviews - that' 110,000 page loaded events, of which 22,000 would be for Hovercards enabled.

Even if the number of events per page is double or triple (e.g., due to reasonable Hovercards engagement per page or back/forward re-entrance) I think we're in safe territory to not later burden ourselves during queries for basic funnel analysis. And suppose it's on the lower end, it doesn't seem so low as to stifle our analysis (suppose, for example, the baseline behavior is 2/3 of the pageviews are part of on average 2-pageview long session funnels; ). As a point of reference, NavigationTiming collected 426,540 events on 20160524. [2]

[1] hu.wikipedia.org desktop pageview breakdown by browser and version

select user_agent_map['browser_family'], user_agent_map['browser_major'], sum(view_count)
from pageview_hourly
where year = 2016 and month = 5 and day = 25
and agent_type = 'user'
and access_method = 'desktop'
and project = 'hu.wikipedia'
group by user_agent_map['browser_family'], user_agent_map['browser_major'];

[2] NavigationTiming query for number of events

select count(*) from NavigationTiming_15485142 where timestamp > '20160524' and timestamp < '20160525';
count(*)
426540

$lang : hu.wikipedia.org
$value of users to get Hovercards UX with previews turned on: 20%, which I think strikes an okay balance. @Tbayer, @jrobell, @DStrine does that work for you or should we set it higher and make it a head to head test at 50%?

@dr0ptp4kt I think it makes more sense to use a 50/50 split here, but I would like @Pcoombe
to way in here as well. Peter - what do you think?

Update. @phuedx and I caught up today. I think we'll want to do a Thursday, 2-June SWAT afternoon at 1% and then a Monday, 6-June SWAT (morning preferably, or afternoon if needed) to the percentage we determine here.

Jdlrobson changed the task status from Open to Stalled.May 30 2016, 9:11 PM

@dr0ptp4kt do you want to put something on the calendar with an owner to make sure this happens? It looks like pending sign off we are good to go.

@jrobell @dr0ptp4kt We would normally use 50/50 for fundraising tests, but there's no particular reason we can't use a different ratio if that works better for you or the community.

dr0ptp4kt updated the task description. (Show Details)
dr0ptp4kt added a subscriber: Tgr.Jun 1 2016, 5:06 PM

@Tgr would you please post a quick note to indicate that we'll be enabling Hovercards at 1% tomorrow, and tentatively 50% sometime next week?

dr0ptp4kt updated the task description. (Show Details)Jun 1 2016, 5:07 PM

@dr0ptp4kt do you want to put something on the calendar with an owner to make sure this happens? It looks like pending sign off we are good to go.

Jon, just want to get a bit of clarification on this: By "good to go" do you mean "good to implement the change," or has the config change already been made and for some reason wasn't attached to this task?

@dr0ptp4kt is there anything that needs to be done to this to get it from Stalled to Open? Per our team norms, we'd rather not SWAT more than necessary (i.e. the two-stage plan in one of your earlier comments). I also know the team is eager to finish this sprint with a clean board.

Ah the curse of not refreshing before commenting! I suppose my comments still stand though since the task is still listed as "Stalled." Are we waiting on notifying the community?

Tgr added a comment.Jun 1 2016, 6:24 PM

@Tgr would you please post a quick note to indicate that we'll be enabling Hovercards at 1% tomorrow, and tentatively 50% sometime next week?

Done.

dr0ptp4kt changed the task status from Stalled to Open.Jun 1 2016, 6:38 PM

Thanks @Tgr. @jhobs would you please ready the patch for tomorrow afternoon's SWAT and add it to the SWAT calendar?

Change 292206 had a related patch set uploaded (by Jhobs):
Enable Hovercards for huwiki

https://gerrit.wikimedia.org/r/292206

jhobs added a subscriber: bmansurov.

Scheduled 292206 for Evening SWAT with @bmansurov listed as requesting developer.

Change 292206 merged by jenkins-bot:
Enable Hovercards for huwiki

https://gerrit.wikimedia.org/r/292206

Change 292506 had a related patch set uploaded (by Bmansurov):
Enable the Popups experiment

https://gerrit.wikimedia.org/r/292506

Change 292506 merged by jenkins-bot:
Enable the Popups experiment

https://gerrit.wikimedia.org/r/292506

Popups experiment has been SWAT deployed to huwiki.

phuedx reassigned this task from jhobs to dr0ptp4kt.Jun 3 2016, 11:08 AM
dr0ptp4kt closed this task as Resolved.Jun 5 2016, 6:41 PM

Signing off. There's a follow on action in T137059: Do not log errors when they are actually client-initiated XHR cancellations, although that follow-on action doesn't block increasing the Hovercards ON number (because the data analyst can filter out the noise in the interim).

dr0ptp4kt updated the task description. (Show Details)Jun 5 2016, 6:41 PM

We're ready to start the fundraising element of this test tomorrow