Page MenuHomePhabricator

Investigate SearchSatisfaction mismatched test buckets
Closed, ResolvedPublic13 Estimated Story Points

Description

As a consumer of AB test results I want accurate results so i can make sound decisions

Looking at data from the commonswiki mediasearch ab test, from 2019-09-10T16:00 through 17:00, there are thirteen events where the frontend logged one bucket, but the backend logging recorded a different bucket. Not sure if we've looked at this specifically before, joining frontend and backend logs and comparing recorded buckets. If frontend and backend don't agree on test buckets the data will be less reliable, and it will generally cause the stats to tend towards the same values in separate buckets.

Example bad request:

  • search_id: 163dliqu8lj2hpsgn9cvrbdwo
  • mediawiki_cirrussearch_request logged http params: cirrusUserTesting=control
  • event.SearchSatisfaction logged subTest: mediasearch_commons_int

This ticket is for the investigation and to create new tickets for the solution.

Event Timeline

CBogen set the point value for this task to 13.

For 691588 backend events matching a test bucket:

  • 437764 match a SearchSatisfaction searchResultPage event
  • 7204 are inconsistent with their corresponding SearchSatisfaction searchResultPage event (joining on the search token)
  • 246979 have no matching SearchSatisfaction searchResultPage event, only 10 are matching go, rest is unclear

I think there might few reasons of the mistmatch/non matching frontend logs

  • User clicks a search link that has a cirrusUserTesting=bucket attached to it, 755 of these links are found on wiki: https://global-search.toolforge.org/?q=cirrusUserTesting&namespaces=&title . We might want to cleanup the search url so that users do not paste them somewhere else
  • User refresh/reopen a search tab and the search satisfaction session they were previously in has expired but this tab is on a search link with a cirrusUserTesting=bucket URL

But I doubt these reasons could explain the missing 246979 frontend events.

Change 634308 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Send enabled tests to the frontend

https://gerrit.wikimedia.org/r/634308

Change 634309 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/WikimediaEvents@master] [searchSatisfaction] check validity of test buckets

https://gerrit.wikimedia.org/r/634309

Change 634308 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Send enabled tests to the frontend

https://gerrit.wikimedia.org/r/634308

The 246979 non-matching events are likely due to T265374
For the 7204 I could only find these two explanations:

  • User clicks a search link that has a cirrusUserTesting=bucket attached to it
  • User reopen its browser with several tabs opened one of which has link with a cirrusUserTesting=bucket param attached to it

Mitigation is to avoid keeping the cirrusUserTesting=bucket param in the location bar of users' browser: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/634184/.
But also try to detect the mismatch from the frontend code and avoid sending broken events: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/634309/.
Salvaging such sessions seems difficult so if the number of mismatch/invalid sessions stays below a certain threshold (<1%) we believe that it is acceptable and won't penalize future A/B tests (if such sessions are properly identified as such).

Change 634309 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] [searchSatisfaction] check validity of test buckets

https://gerrit.wikimedia.org/r/634309