Interleaved results A/B test: check that data is flowing the way we expect
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	debt
	Jul 20 2017, 7:52 PM

Description

Let's be sure that the data we're expecting in this 2-way test is coming in like what we expect it to.

Details

	Subject	Repo	Branch	Lines +/-
	Increase enwiki sampling of cirrus to 1k session per day per bucket	mediawiki/extensions/WikimediaEvents	wmf/1.30.0-wmf.14	+2 -2
	Increase enwiki sampling of cirrus to 1k session per day per bucket	mediawiki/extensions/WikimediaEvents	master	+2 -2

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T174064 [FY 2017-18 Objective] Implement advanced search methodologies
Resolved	EBernhardson	T161632 [Epic] Improve search by researching and deploying machine learning to re-rank search results
Resolved	EBernhardson	T162369 Evaluate rescore windows for learning to rank
Resolved	None	T174066 [Q1 2017-18 Objective] Perform load and A/B tests on new models (interleaved search results)
Resolved	EBernhardson	T150032 Add support for interleaved results in 2-way A/B test
Resolved	debt	T171212 Interleaved results A/B test: turn on
Resolved	EBernhardson	T171213 Interleaved results A/B test: check that data is flowing the way we expect
Resolved	debt	T171214 Interleaved results A/B test: turn off test
Resolved	mpopov	T171215 Interleaved results A/B test: analysis of data
Declined	EBernhardson	T171984 Turn on test of LTR with standard AB buckets and an interleaved bucket.

Event Timeline

debt created this task.Jul 20 2017, 7:52 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 20 2017, 7:52 PM

This was already done.

Data is actually much smaller than expected. At 1:2000 we collect ~15k sessions per day of full text. Sampling was increased to 1:500 and 75% of sessions were directed into the test, but the results were 15k sessions per day for dashboards and only ~600 sessions per day recording events into the test (when it should have been 45k). Not clear yet what happened.

Not clear yet whats gone wrong here. I've poked at the raw event logging events, inside the eventlogging-client-side kafka topic, and the same ratio of events by subTest is there. Looking at the webrequests table in hive shows the same ratio of events by subTest as well. This suggests the events are not being sent, or are being thrown away incredibly early in the pipeline (unlikely).

The breakdown of events that did get logged by either OS or Browser do not suggest we are failing on most browsers and only working in specific cases. Something else is going on but its really not clear what. Will continue investigating.

Some documentation from event logging is suspicious, but i also think this might not be the case anymore, because i see events making it through with a payload > 1kB. While our search result page events are >1kB, other events like 'visitPage' are much smaller so those should have still come through event if the search result page events were rejected. Also based on the doc the events should have been truncated, which would still be detectable, rather than completely disapearing:

There is a limitation of the size of individual EventLogging events due the underlying infrastructure (limited size of urls in Varnish's varnishncsa/ varnishlog, as well as Wikimedia UDP packets). For the purpose of size limitation, an "entry" is a /beacon request URL containing urlencoded JSON-stringified event data. Entries longer than 1014 bytes are truncated. When an entry is truncated, it will fail validation because of parsing (as the result is invalid JSON).

EBernhardson reopened this task as Open.Aug 17 2017, 9:39 PM

EBernhardson claimed this task.

EBernhardson moved this task from Done to In progress on the Discovery-Analysis (Current work) board.

Mentioned in SAL (#wikimedia-operations) [2017-08-18T00:18:48Z] <ebernhardson@tin> Synchronized php-1.30.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171213: Increase sampling rate of cirrus satisfaction schema (duration: 00m 44s)

Change 372490 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Increase enwiki sampling of cirrus to 1k session per day per bucket

https://gerrit.wikimedia.org/r/372490

gerritbot added a project: Patch-For-Review.Aug 18 2017, 12:33 AM

What went wrong here is i completely mis-estimated the event counts, by making the incorrect assumption enwiki made up the majority of logged search sessions. Because we vary our sampling by wiki enwiki makes up < 2% of the sessions we record.

Initial sampling rate: 1:2000
Sessions collected per day: ~250
Estimated sessions per day: 500,000

Desired sessions per bucket per day: 1000
Number of buckets: 6

Total sessions sampled: 250 + (6*1000) = 6250
New sampling rate = 500000/6250 = 1 in 80
% of sessions going into sub test: 6000/6250 = 0.96

I'll be deploying this update in a few minutes, and we should collect 1k events per day per bucket. I have no clue how many we need, but it means analysis of the previously collected data could go forward if we think it's enough.

Change 372490 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Increase enwiki sampling of cirrus to 1k session per day per bucket

https://gerrit.wikimedia.org/r/372490

Change 372491 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.14] Increase enwiki sampling of cirrus to 1k session per day per bucket

https://gerrit.wikimedia.org/r/372491

Change 372491 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.14] Increase enwiki sampling of cirrus to 1k session per day per bucket

https://gerrit.wikimedia.org/r/372491

Mentioned in SAL (#wikimedia-operations) [2017-08-18T00:48:42Z] <ebernhardson@tin> Synchronized php-1.30.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171213: Increase sampling rate of cirrus satisfaction schema (again) to 1k per bucket per day (duration: 00m 44s)

EBernhardson moved this task from In progress to Done on the Discovery-Analysis (Current work) board.Aug 18 2017, 12:59 AM

ReleaseTaggerBot added a project: MW-1.30-release-notes (WMF-deploy-2017-08-22 (1.30.0-wmf.15)).Aug 21 2017, 8:01 PM

• EBjune added a parent task: T174066: [Q1 2017-18 Objective] Perform load and A/B tests on new models (interleaved search results).Aug 31 2017, 4:46 PM

debt closed this task as Resolved.Aug 31 2017, 8:31 PM

Interleaved results A/B test: check that data is flowing the way we expectClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Interleaved results A/B test: check that data is flowing the way we expect
Closed, ResolvedPublic
Actions

Related Objects
Search...