Analyse results of TextCat A/B test
Closed, DeclinedPublic4 Estimated Story Points
Actions

Assigned To

Authored By

	• Deskana
	May 3 2016, 9:48 PM

Description

After the TextCat A/B test is turned off (see T134319), the data should be analysed to see whether the test had a significant impact.

Related Objects
Search...

Status	Assigned	Task
Open	None	T118278 [EPIC] Improve Language Identification for use in Cirrus Search
Resolved	EBernhardson	T121543 Do an A/B Tests on Other Wikis with TextCat for Language Identification
Resolved	EBernhardson	T137158 Compile and then resolve issues with TextCat A/B test data
Declined	mpopov	T134320 Analyse results of TextCat A/B test
Resolved	EBernhardson	T134319 Turn off TextCat A/B test on the English Wikipedia on or after May 23
Resolved	debt	T134318 Verify data pipeline for TextCat A/B test on English Wikipedia
Resolved	EBernhardson	T121542 Write and deploy an A/B Test on enwiki using TextCat for Language Identification
Resolved	dcausse	T121540 Investigate Updating Cybozu / ES Plugin for Language Identification
Resolved	EBernhardson	T124844 Add textcat to mediawiki vendor libs
Resolved	Smalyshev	T121538 Convert TextCat to PHP Library for Language Identification in Cirrus Search
Resolved	TJones	T123537 Generate wikitext-based and query-based language models for TextCat
Resolved	TJones	T123651 Decide which set of separators we have to use for TextCat ngrams
Resolved	• dpatrick	T123558 Security review for TextCat library
Resolved	mpopov	T132706 Validate click events in TestSearchSatisfaction2

Event Timeline

• Deskana created this task.May 3 2016, 9:48 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 3 2016, 9:48 PM

• Deskana triaged this task as Medium priority.May 3 2016, 9:48 PM

• Deskana added a subtask: T134319: Turn off TextCat A/B test on the English Wikipedia on or after May 23.

• Deskana moved this task from Needs triage to Up Next on the Discovery-Analysis board.May 5 2016, 8:05 PM

A couple things (although certainly more) i think we could look at:

Click through to the alternate wiki
Query reformulation after being showed alternate wiki results

In T134320#2276581, @EBernhardson wrote:

A couple things (although certainly more) i think we should look at:

This topic came up in a discussion today with @EBernhardson and @dcausse.

Click through to the alternate wiki
Query reformulation after being showed alternate wiki results

A clarification: tracking query reformulation by the user is interesting and useful in its own right, as a way of getting possible alternative versions of a query (e.g., for automatic correction). In this case, the idea is that a reformulated query from the user without a clickthrough to another wiki after presenting other-language cross-wiki results indicates that the results were not useful.

Other ideas that came up:

looking at satisfaction metrics for all queries identified as being in another language in one big bucket, vs looking at by-language buckets. (e.g., on enwiki, results in Spanish/from eswiki are good, but results in French/from frwiki are not.)
looking at satisfaction metrics for queries based on number of cross-wiki results (1 result may be a fluke, 5000 results means the language is probably right).

I'll also try to get others to take a peek over here and add more.

Perhaps interesting, but maybe not a factor in deciding to keep the feature:

% of zero result requests that now get results
% of requests that were provided inter-wiki results that click on one

In T134320#2286329, @TJones wrote:

looking at satisfaction metrics for queries based on number of cross-wiki results (1 result may be a fluke, 5000 results means the language is probably right).

@TJones Hm… Do you have suggestions for the threshold we can use to determine this on the whole dataset? We won't be able to look at each of 100K+ sessions individually.

Note to future @mpopov: the extra data field in the TSS2 table will have 3 values (actually detected language, wiki queried, and number of results) that will need to be separated into 3 columns.

moving into sprint for working on this week

mpopov claimed this task.Jun 3 2016, 5:55 PM

mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.

mpopov set the point value for this task to 4.

mpopov moved this task from In progress to Done on the Discovery-Analysis (Current work) board.Jun 6 2016, 8:07 PM

mpopov added a parent task: T137158: Compile and then resolve issues with TextCat A/B test data.Jun 6 2016, 10:17 PM

Cannot proceed with analysis as data is too faulty to be reliable. We will fix the EL and relaunch the test. See follow-up: T137158

debt closed subtask T134319: Turn off TextCat A/B test on the English Wikipedia on or after May 23 as Resolved.Jun 8 2016, 12:39 AM

debt closed this task as Declined.Jun 14 2016, 11:20 PM

debt moved this task from Done to Resolved on the Discovery-Analysis (Current work) board.Jul 20 2016, 4:23 PM

Analyse results of TextCat A/B testClosed, DeclinedPublic4 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Analyse results of TextCat A/B test
Closed, DeclinedPublic4 Estimated Story Points
Actions

Related Objects
Search...