Compile and then resolve issues with TextCat A/B test data
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	mpopov
	Jun 6 2016, 10:16 PM

Description

We've previously identified a few issues with the data (see T134318) but additional issues have come up in the analysis of the data on Friday, June 3rd. This task is for comprehensively identifying the major issues with the data so we know what to fix on the EL side before we relaunch the test.

Related Objects
Search...

Status	Assigned	Task
Open	None	T118278 [EPIC] Improve Language Identification for use in Cirrus Search
Resolved	EBernhardson	T121543 Do an A/B Tests on Other Wikis with TextCat for Language Identification
Resolved	EBernhardson	T137158 Compile and then resolve issues with TextCat A/B test data
Declined	mpopov	T134320 Analyse results of TextCat A/B test
Resolved	EBernhardson	T134319 Turn off TextCat A/B test on the English Wikipedia on or after May 23
Resolved	debt	T134318 Verify data pipeline for TextCat A/B test on English Wikipedia
Resolved	EBernhardson	T121542 Write and deploy an A/B Test on enwiki using TextCat for Language Identification
Resolved	dcausse	T121540 Investigate Updating Cybozu / ES Plugin for Language Identification
Resolved	EBernhardson	T124844 Add textcat to mediawiki vendor libs
Resolved	Smalyshev	T121538 Convert TextCat to PHP Library for Language Identification in Cirrus Search
Resolved	TJones	T123537 Generate wikitext-based and query-based language models for TextCat
Resolved	TJones	T123651 Decide which set of separators we have to use for TextCat ngrams
Resolved	• dpatrick	T123558 Security review for TextCat library
Resolved	mpopov	T132706 Validate click events in TestSearchSatisfaction2

Event Timeline

mpopov created this task.Jun 6 2016, 10:16 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 6 2016, 10:16 PM

mpopov added a subtask: T134320: Analyse results of TextCat A/B test.Jun 6 2016, 10:17 PM

mpopov mentioned this in T134320: Analyse results of TextCat A/B test.Jun 6 2016, 10:22 PM

We need to record if the result the user clicked on is specifically an interwiki result. Right now we can't distinguish, and this is perhaps the most important one of the bunch.
Third value in the extraParams field is missing in a lot of cases. I've appended a ",0" to the cases where only 2 values were recorded, but I think we should have it explicitly say 0 results, so if it's actually missing then that's a marker of something having gone wrong.
There's something funky going on with that third value (besides the 0s not getting recorded correctly). In the sample set I was looking at, the third value was < hits returned in 36% of searches, > hits returned in ~8% of searches, and == hits returned in ~56% of searches. According to the following transcript, this number should always be <= hits returned:

12:11 ebernhardson: that last part is best explained ehre: https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/modules/ext.wikimediaEvents.searchSatisfaction.js#L430-L436
12:12 ebernhardson: the 0 is because 0 is falsy ...should have explicitly checked for existance rather than truthiness. empty is because if both are not provided (because they wern't run) textcatExtra ends up empty
17:07 ebernhardson: bearloga: so the third value is always the number of results returned by the alternate language query
17:07 ebernhardson: bearloga: the main hitsReturned is the number of .mw-search-result-heading elements on the page, so in theory that should be a combination of both
17:10 ebernhardson: bearloga: specifically the third value is the number of results returned to mediawiki (not, for example, the total hits elasticsearch reports)
17:11 ebernhardson: what i mean is if you did a search it might say something like 'Results 1 - 20 of 5,127'.  The third value would be the 20, not 5,127
17:19 ebernhardson: bearloga: hits returned is the total number of results shown on the page, both from the local wiki and any other wiki we queried
17:20 ebernhardson: basically it's just a count of the number of search links on the page
17:20 ebernhardson: bearloga: the third param of the extra data is the number of results only for the alternate wiki query.
                    In theory the third param should always be less than the hitsReturned value

To my knowledge, those are the only data quality issues.

mpopov moved this task from Backlog to Done on the Discovery-Analysis (Current work) board.Jun 6 2016, 10:56 PM

mpopov mentioned this in T137163: Part Deux: TextCat A/B test for Language Identification - specification.Jun 6 2016, 11:02 PM

EBernhardson renamed this task from Compile list of issues with TextCat A/B test data to Compile and then resolve issues with TextCat A/B test data.Jun 7 2016, 8:17 PM

EBernhardson removed mpopov as the assignee of this task.

EBernhardson claimed this task.

EBernhardson added a project: Discovery-Search (Current work).

EBernhardson moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

To improve debugability of tests (figuring out why a particular thing is wierd) I'm also adding the searchToken to the logging schema. This allows us to associate a particular EL row with a CirrusSearchRequestSet row which includes the queries performed, the url that was given to us, etc.

debt closed this task as Resolved.Jun 7 2016, 9:59 PM

debt reopened this task as Open.Jun 8 2016, 12:24 AM

Change 293432 had a related patch set uploaded (by EBernhardson):
Textcat search satisfaction subtest for multiple wikis

https://gerrit.wikimedia.org/r/293432

gerritbot added a project: Patch-For-Review.Jun 8 2016, 10:21 PM

Digging through the code I can (so far) only come up with one guess for why the "interwikiHits <= hitsReturned" constraint doesn't always hold. When rendering results returned by the search engine to the result page any result where $result->isBrokenTitle() returns true it will not be rendered. This can happen in cases where we are unable to construct a Title object from the title reported by elasticsearch. I havn't yet found a reproduction case for this, but will continue looking.

@mpopov other possibility, are all the ones where we have more interwikiHits than hitsReturned members of textcat1:a subTest? In those cases it is expected, because we ran the interwiki query (to aid in data collection and to provide similar latency to the control group) but we do not render the results.

I tried to extract all queries that had this issue, but only came up with 4 that were not in the textcat1:a bucket. All of these actually return >= 3 results and shouldn't have triggered the extra textcat query anyways . Best guess is the query was using the advanced query profile to search a specific namespace, but we don't record that in the satisfaction schema :( I tried a few random namespaces, but couldn't find a namespace that had 0 results on enwiki but results in the same namespace on the identified alternate wiki. Could try a more exhaustive search but not sure worth the effort. With the new patch adding the searchToken to satisfaction schema will be able to look things up in CirrusSearchRequestSets to maybe get a better idea of what was happening.

select wiki, event_subTest, event_query, event_hitsReturned, event_extraParams from TestSearchSatisfaction2_15357244 where timestamp between '20160510000000' and '20160523000000' and event_source = 'fulltext' and event_action = 'searchResultPage' and length(event_extraParams) - length(replace(event_extraParams, ',', '')) = 2 AND event_hitsReturned < substring_index(event_extraParams, ',', -1) AND event_subTest != 'textcat1:a' limit 5;
+--------+---------------+----------------------+--------------------+-------------------+
| wiki   | event_subTest | event_query          | event_hitsReturned | event_extraParams |
+--------+---------------+----------------------+--------------------+-------------------+
| enwiki | textcat1:c    | Nude per l'assassino |                  0 | pt,ptwiki,1       |
| enwiki | textcat1:c    | Digitaria sericea    |                  0 | pt,ptwiki,2       |
| enwiki | textcat1:c    | Cyperus scariosus    |                  0 | es,eswiki,1       |
| enwiki | textcat1:c    | Cyperus scariosus    |                  0 | es,eswiki,1       |
+--------+---------------+----------------------+--------------------+-------------------+

Counting occurances across all buckets we have:

 select event_subTest, count(1) as total from TestSearchSatisfaction2_15357244 where timestamp between '20160510000000' and '20160523000000' and event_source = 'fulltext' and event_action = 'searchResultPage' and length(event_extraParams) - length(replace(event_extraParams, ',', '')) = 2 AND event_hitsReturned < substring_index(event_extraParams, ',', -1) group by event_subTest;
+---------------+-------+
| event_subTest | total |
+---------------+-------+
| textcat1:a    |   105 |
| textcat1:c    |     4 |
+---------------+-------+
2 rows in set (21.45 sec)

In T137158#2366323, @EBernhardson wrote:

I tried to extract all queries that had this issue, but only came up with 4 that were not in the textcat1:a bucket. All of these actually return >= 3 results and shouldn't have triggered the extra textcat query anyways . Best guess is the query was using the advanced query profile to search a specific namespace, but we don't record that in the satisfaction schema

@EBernhardson: If that's the most sound explanation, perhaps it makes sense for me to exclude sessions in which this phenomenon occurred?

My worry now is if there are other searches with >= 3 enwiki results that end up making it into the test when they shouldn't have.

4 searches out of a two week sample is pretty small, probably safe to not be concerned?

debt triaged this task as Medium priority.Jun 14 2016, 11:20 PM

debt closed subtask T134320: Analyse results of TextCat A/B test as Declined.

EBernhardson moved this task from not in use - please delete to Needs review on the Discovery-Search (Current work) board.Jun 16 2016, 4:49 PM

EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Jun 23 2016, 9:34 PM

debt closed this task as Resolved.Jul 20 2016, 4:21 PM

debt moved this task from Done to Resolved on the Discovery-Analysis (Current work) board.

Compile and then resolve issues with TextCat A/B test dataClosed, ResolvedPublic1 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Compile and then resolve issues with TextCat A/B test data
Closed, ResolvedPublic1 Estimated Story Points
Actions

Related Objects
Search...