Page MenuHomePhabricator

[Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test
Closed, ResolvedPublic

Description

This task tracks the second AC of T172291: Launch page previews A/B test on enwiki and dewiki.

AC

  • Verify the bucketing method: Around 50% of sessions sending Popups events should have popupEnabled = 1.
  • (modified from earlier condition - bucketing is by session, not by pageview)
  • Done in T175377#3605918
  • We verify that the sampling rates on enwiki and dewiki are correctly enforced per @Pcoombe's question:

We're seeing some odd results here, although I don't think it's related to the eventlogging issues you mention. Quick question: can you confirm what percentage of people should be seeing hovercards? And does that match what you're seeing in your logging?

See @Pcoombe's comment in T175377#3600964.

Event Timeline

@Pcoombe: To confirm, the on bucket (an anon-only bucket with Page Previews enabled by default) is currently configured as:

wikion bucket size (% of anon users)off bucket size (% of anon users)
enwiki3%3%
dewiki8%8%

We verify that 50% of pageLoaded Popups events have popupEnabled = 1

Combining [0] and [1]:

  • 49.89% of pageLoaded Popups events have popupEnabled = 1.
  • There are no pageLoaded events without the popupEnabled field set.
[0]
+----------+--------+
| Enabled? | n      |
+----------+--------+
|        0 | 694230 |
|        1 | 691179 |
+----------+--------+

select
  event_popupEnabled as 'Enabled?',
  count(*) as 'n'
from
  log.Popups_16364296
where
  month(timestamp) = 9
  and day(timestamp) = 1
  
  and event_action = 'pageLoaded'
group by
  event_popupEnabled;
[1]
+---------+
| n       |
+---------+
| 1385409 |
+---------+

select
  count(*) as 'n'
from
  log.Popups_16364296
where
  month(timestamp) = 9
  and day(timestamp) = 1
  
  and event_action = 'pageLoaded';
phuedx renamed this task from Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test to [Spike] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test.Sep 11 2017, 6:47 PM
phuedx updated the task description. (Show Details)
phuedx added a project: Spike.

Thanks @phuedx - BTW, on the request of @Jdlrobson, I had already done a very rough check for this right after the launch on August 28 ("too early to tell, but so far not totally out of whack").

However, since the bucketing is per session and not per pageview, we shouldn't actually count pageLoaded events for this (not sure how this ended up in the task description of T172291). Measuring how the number of pageLoaded events differs between the test and control conditions is actually one of the main purposes of the experiment, so the above result is in fact a bit puzzling.

@Pcoombe: Could you clarify what you mean when you say "odd results"?

Moving to Blocked to reflect that we're waiting on information from Advancement.

MBinder_WMF renamed this task from [Spike] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test to [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test.Sep 12 2017, 4:11 PM

@phuedx We're only seeing about 1.5% of our banner impressions on enwiki logged as having popups enabled, but around 3% of donations. It would be nice if this meant popups make our banners 100% more effective! However I expect there is an issue with our impression logging, perhaps that it is sometimes running before the mw.popups.isEnabled() is available.

I added a short timeout before the alterImpressionData function in the banners checks for mw.popups.isEnabled(). We'll see if that helps in our test tomorrow.

@phuedx - is our "on" bucket 3% or control + on together?

@ovasileva: The on bucket is 3% of anonymous users and the control bucket is 3% of anonymous users. I've updated T175377#3596414 to reflect this.

Thanks for the information @Pcoombe! Just a note that @Tbayer and I will be discussing the bucketing logic today. If we spot something on our side, then we'll let you know ASAP.

Below is a check whether sessions within the sample are correctly bucketed with 50% probability into either the enabled or disabled condition. These numbers look sound per se. (We expect some slight deviation because of users manually disabling and enabling the feature, which however appears to happen rarely enough - generally in less than 0.01% of sessions, per the second query below.) - However, it's quite odd in combination with the corresponding result for pageviews (T175377#3598231 ).

@phuedx and I have been looking further into this today and will post more updates later.

SELECT date,
ROUND(100*SUM(IF(enabled, sessions, 0))/SUM(sessions),6) AS percent_enabled,
SUM(sessions) AS total_sessions
FROM (
SELECT
year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date, 
COUNT(DISTINCT event.sessionToken) AS sessions,
event.popupEnabled AS enabled
FROM nuria.Popups
WHERE year = 2017 
AND wiki ='enwiki'
AND event.isAnon = 1
GROUP BY year, month, day, event.popupEnabled) AS iq
GROUP BY date
ORDER BY date LIMIT 10000;

date	percent_enabled	total_sessions
2017-08-28	50.424215	4361
2017-08-29	50.048408	68170
2017-08-30	49.90909	641841
2017-08-31	49.954333	1325907
2017-09-01	49.964724	1238798
2017-09-02	49.92723	923456
2017-09-03	49.756386	1015132
2017-09-04	49.924608	1332371
2017-09-05	49.879428	1596141
2017-09-06	49.916476	1657611
2017-09-07	50.013277	1498816
11 rows selected (102.275 seconds)
SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date,
COUNT(*) AS disables
FROM nuria.popups
WHERE event.action = 'disabled'
AND year = 2017
GROUP BY year, month, day
ORDER BY year, month, day LIMIT 10000;

year	month	day	date	disables
2017	8	28	2017-08-28	6
2017	8	29	2017-08-29	11
2017	8	30	2017-08-30	49
2017	8	31	2017-08-31	92
2017	9	1	2017-09-01	69
2017	9	2	2017-09-02	97
2017	9	3	2017-09-03	89
2017	9	4	2017-09-04	86
2017	9	5	2017-09-05	100
2017	9	6	2017-09-06	85
2017	9	7	2017-09-07	85
11 rows selected (74.232 seconds)
phuedx updated the task description. (Show Details)

The AC have been met 🎉🎉🎉

I believe @Pcoombe will be tweaking the implementation of the banner impression related code for a test later on in the year (EOQ Q2).