Page MenuHomePhabricator

Compile and then resolve issues with TextCat A/B test data
Closed, ResolvedPublic1 Story Points

Description

We've previously identified a few issues with the data (see T134318) but additional issues have come up in the analysis of the data on Friday, June 3rd. This task is for comprehensively identifying the major issues with the data so we know what to fix on the EL side before we relaunch the test.

Event Timeline

mpopov created this task.Jun 6 2016, 10:16 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 6 2016, 10:16 PM
  • We need to record if the result the user clicked on is specifically an interwiki result. Right now we can't distinguish, and this is perhaps the most important one of the bunch.
  • Third value in the extraParams field is missing in a lot of cases. I've appended a ",0" to the cases where only 2 values were recorded, but I think we should have it explicitly say 0 results, so if it's actually missing then that's a marker of something having gone wrong.
  • There's something funky going on with that third value (besides the 0s not getting recorded correctly). In the sample set I was looking at, the third value was < hits returned in 36% of searches, > hits returned in ~8% of searches, and == hits returned in ~56% of searches. According to the following transcript, this number should always be <= hits returned:
12:11 ebernhardson: that last part is best explained ehre: https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/modules/ext.wikimediaEvents.searchSatisfaction.js#L430-L436
12:12 ebernhardson: the 0 is because 0 is falsy ...should have explicitly checked for existance rather than truthiness. empty is because if both are not provided (because they wern't run) textcatExtra ends up empty
17:07 ebernhardson: bearloga: so the third value is always the number of results returned by the alternate language query
17:07 ebernhardson: bearloga: the main hitsReturned is the number of .mw-search-result-heading elements on the page, so in theory that should be a combination of both
17:10 ebernhardson: bearloga: specifically the third value is the number of results returned to mediawiki (not, for example, the total hits elasticsearch reports)
17:11 ebernhardson: what i mean is if you did a search it might say something like 'Results 1 - 20 of 5,127'.  The third value would be the 20, not 5,127
17:19 ebernhardson: bearloga: hits returned is the total number of results shown on the page, both from the local wiki and any other wiki we queried
17:20 ebernhardson: basically it's just a count of the number of search links on the page
17:20 ebernhardson: bearloga: the third param of the extra data is the number of results only for the alternate wiki query.
                    In theory the third param should always be less than the hitsReturned value

To my knowledge, those are the only data quality issues.

EBernhardson renamed this task from Compile list of issues with TextCat A/B test data to Compile and then resolve issues with TextCat A/B test data.Jun 7 2016, 8:17 PM
EBernhardson removed mpopov as the assignee of this task.
EBernhardson claimed this task.

To improve debugability of tests (figuring out why a particular thing is wierd) I'm also adding the searchToken to the logging schema. This allows us to associate a particular EL row with a CirrusSearchRequestSet row which includes the queries performed, the url that was given to us, etc.

debt closed this task as Resolved.Jun 7 2016, 9:59 PM
debt reopened this task as Open.Jun 8 2016, 12:24 AM

Change 293432 had a related patch set uploaded (by EBernhardson):
Textcat search satisfaction subtest for multiple wikis

https://gerrit.wikimedia.org/r/293432

Digging through the code I can (so far) only come up with one guess for why the "interwikiHits <= hitsReturned" constraint doesn't always hold. When rendering results returned by the search engine to the result page any result where $result->isBrokenTitle() returns true it will not be rendered. This can happen in cases where we are unable to construct a Title object from the title reported by elasticsearch. I havn't yet found a reproduction case for this, but will continue looking.

EBernhardson added a comment.EditedJun 8 2016, 10:44 PM

@mpopov other possibility, are all the ones where we have more interwikiHits than hitsReturned members of textcat1:a subTest? In those cases it is expected, because we ran the interwiki query (to aid in data collection and to provide similar latency to the control group) but we do not render the results.

I tried to extract all queries that had this issue, but only came up with 4 that were not in the textcat1:a bucket. All of these actually return >= 3 results and shouldn't have triggered the extra textcat query anyways . Best guess is the query was using the advanced query profile to search a specific namespace, but we don't record that in the satisfaction schema :( I tried a few random namespaces, but couldn't find a namespace that had 0 results on enwiki but results in the same namespace on the identified alternate wiki. Could try a more exhaustive search but not sure worth the effort. With the new patch adding the searchToken to satisfaction schema will be able to look things up in CirrusSearchRequestSets to maybe get a better idea of what was happening.

select wiki, event_subTest, event_query, event_hitsReturned, event_extraParams from TestSearchSatisfaction2_15357244 where timestamp between '20160510000000' and '20160523000000' and event_source = 'fulltext' and event_action = 'searchResultPage' and length(event_extraParams) - length(replace(event_extraParams, ',', '')) = 2 AND event_hitsReturned < substring_index(event_extraParams, ',', -1) AND event_subTest != 'textcat1:a' limit 5;
+--------+---------------+----------------------+--------------------+-------------------+
| wiki   | event_subTest | event_query          | event_hitsReturned | event_extraParams |
+--------+---------------+----------------------+--------------------+-------------------+
| enwiki | textcat1:c    | Nude per l'assassino |                  0 | pt,ptwiki,1       |
| enwiki | textcat1:c    | Digitaria sericea    |                  0 | pt,ptwiki,2       |
| enwiki | textcat1:c    | Cyperus scariosus    |                  0 | es,eswiki,1       |
| enwiki | textcat1:c    | Cyperus scariosus    |                  0 | es,eswiki,1       |
+--------+---------------+----------------------+--------------------+-------------------+

Counting occurances across all buckets we have:

 select event_subTest, count(1) as total from TestSearchSatisfaction2_15357244 where timestamp between '20160510000000' and '20160523000000' and event_source = 'fulltext' and event_action = 'searchResultPage' and length(event_extraParams) - length(replace(event_extraParams, ',', '')) = 2 AND event_hitsReturned < substring_index(event_extraParams, ',', -1) group by event_subTest;
+---------------+-------+
| event_subTest | total |
+---------------+-------+
| textcat1:a    |   105 |
| textcat1:c    |     4 |
+---------------+-------+
2 rows in set (21.45 sec)
mpopov added a comment.Jun 9 2016, 7:09 PM

I tried to extract all queries that had this issue, but only came up with 4 that were not in the textcat1:a bucket. All of these actually return >= 3 results and shouldn't have triggered the extra textcat query anyways . Best guess is the query was using the advanced query profile to search a specific namespace, but we don't record that in the satisfaction schema

@EBernhardson: If that's the most sound explanation, perhaps it makes sense for me to exclude sessions in which this phenomenon occurred?

My worry now is if there are other searches with >= 3 enwiki results that end up making it into the test when they shouldn't have.

4 searches out of a two week sample is pretty small, probably safe to not be concerned?

debt triaged this task as Normal priority.Jun 14 2016, 11:20 PM
debt closed this task as Resolved.Jul 20 2016, 4:21 PM
debt moved this task from Done to Resolved on the Discovery-Analysis (Current work) board.