Page MenuHomePhabricator

Analyze results of A/B test for new search widget
Closed, ResolvedPublic

Description

Description

In T249297: Deploy the new Vue.js search experience we will deploy an A/B test for the new search widget comparing the old and new versions of the widget. This task is related to analyzing success of this experiment

Analysis Criteria

  • Which group has a higher rate of search sessions initiated? How does this differ per wiki? How does this differ per levels of activity of the editors.
  • Which group has a higher rate of search sessions completed? How does this differ per wiki? Per editor bucket?
  • Have any other interesting trends emerged?
  • (for logged-out users) Is there any perceived changes in search behavior before/after the change?

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedovasileva
ResolvedVolker_E
Resolvedphuedx
Resolved eprodromou
ResolvedEvanProdromou
ResolvedPeter.ovchyn
ResolvedPeter.ovchyn
ResolvedPeter.ovchyn
Resolved eprodromou
ResolvedPeter.ovchyn
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolved holger.knust
Resolved holger.knust
Resolved holger.knust
OpenNone
OpenNone
OpenNone
ResolvedSpikeovasileva
Resolvedphuedx
ResolvedJdrewniak
Resolvedalexhollender_WMF
Resolvedovasileva
ResolvedJdlrobson
ResolvedJdrewniak
ResolvedJdrewniak
ResolvedNiedzielski
ResolvedMNeisler
ResolvedMNeisler
Resolvedovasileva
Resolvedphuedx
Resolvedovasileva
Resolvedovasileva
Resolvednray
OpenNone
Resolvedovasileva
ResolvedVolker_E
OpenNone
ResolvedStevenSun
Resolvedovasileva
Resolvedovasileva
ResolvedEdtadros
DeclinedNone
DeclinedNone
ResolvedJdrewniak
ResolvedJdrewniak
Resolved nnikkhoui
Resolved nnikkhoui
ResolvedJdlrobson
Duplicateovasileva
ResolvedMNeisler
ResolvedVolker_E
Resolvedphuedx
Resolvedovasileva
Resolvedovasileva
Resolvedsbassett
Resolved jlinehan
OpenNone
ResolvedNone
Resolved jlinehan
Resolved jlinehan
ResolvedOttomata
ResolvedOttomata
ResolvedSpikeJdlrobson
Resolved jlinehan
OpenNone
Resolved jlinehan
Resolved jlinehan
Resolved jlinehan
Resolved jlinehan
ResolvedJdlrobson
Resolved jlinehan
Resolved jlinehan
Opendr0ptp4kt
Resolvedcolewhite
DeclinedNone
ResolvedNone
Resolved jlinehan
OpenNone
OpenBUG REPORTNone
OpenNone
Openovasileva
Resolvedphuedx
Resolvedalexhollender_WMF
Resolvedovasileva
DuplicateNone
Resolvedalexhollender_WMF
ResolvedMNeisler

Event Timeline

The AB impact analysis report is still in progress but here is an initial look at the number of search sessions initiated by wiki and search group by users in the AB test.

Data below reflects events logged in SearchSatisfaction from 10 March 2021 through 16 March 2021 for all logged-in users that initiated a search on one of the test wikis. Note: While the AB test was run through 30 March 2021, the last week of data was incomplete due to a regression identified in T274869 and was excluded.

Search Sessions Initiated in Widget AB Test by Wiki

WikiOld Search Widget (Control)New Search Widget (Treatment)Both Groups
bnwiki11088198
dewikivoyage502676
euwiki380426806
fawiki99114252416
frwiki120871141323500
frwiktionary85410361890
hewiki69220352727
kowiki5998421441
ptwiki184520913936
ptwikiversity161531
srwiki203432635
trwiki8558191674
vecwiki22325
All 13 test wikis186842067139355

search_sessions_initiated_bywiki.png (2×4 px, 266 KB)

Overall, the number of search sessions initiated by users that saw the new search widget was 10.6% higher than the number of search sessions initiated by users that saw the old search widget. However, results are variable on a per wiki basis and further analysis is needed to infer the impact of the search widget on these observed numbers.

Search Sessions Initiated in Widget AB Test by User's Edit Count

search_sessions_initiated_byeditcount.png (2×4 px, 157 KB)

The new search widget group had more search sessions initiated across all experience levels (based on the user's number of cumulative edits) and there was not a significant variation in the percent difference between the two test groups across the 5 different edit count buckets.

(cc @ovasileva)

MNeisler moved this task from Doing to Needs Review on the Product-Analytics (Kanban) board.
MNeisler added a subscriber: mpopov.

Here is the current draft of the Search Widget AB Report : Repo and Notebook

@mpopov - Assigning to you for review to confirm the approach and findings look correct. Thank you in advance! Please let me know if you have any questions or suggested revisions.

mpopov moved this task from Needs Review to Next 2 weeks on the Product-Analytics (Kanban) board.

@MNeisler: Great job with the report! As I mentioned to you when you first showed it to me, I really like your use of color in the visualizations and in general how you've applied the lessons learned in the Storytelling with Data workshop :)

I left some comments and suggestions for how to improve some charts via ReviewNB. Let me know if you run into any problems responding to the comments, resolving discussions in that tool. Also, you don't have to make revisions based on the code-related suggestions – those are just for educational purpose and future reference.

My main concern is with the entire "search sessions completed" section. In our 1:1 when we discussed the reservations you had with approach 1 I too was apprehensive about including it, and the results in the draft + the hypothesis you had to explain those results convinced me that approach 1 should be omitted. Furthermore, the issue (intentional bug?) you identified with visitPage events only ever being emitted on pages visited from full-text and never from autocomplete (which I also verified) really makes me question the inclusion of approach 2 as well; and, consequently, the inclusion of that section as a whole. To be clear: there is no fault with your methods, but rather the data (which those methods are applied to) is, perhaps, simply too flawed to yield any useful insights. Since the primary KPI for this search sessions initiated, this should be OK, but we should discuss this in our next technical 1:1.

Re-assigned for iteration based on feedback.

MNeisler moved this task from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.

Thanks so much for the review @mpopov! I've incorporated the suggested feedback.

@ovasileva - Here is the revised Search Widget AB Report. Please see summary of high-level findings below.

Per discussions with Mikhail, I've decided to remove the Search Sessions Completed section from the report as the data is too flawed and unreliable to draw any useful insights. We've discussed some of the reasons in our 1:1s (and as described in T275200#7152785 ) but I've documented them below for reference.

But please let me know if you have any questions or concerns about that change or any of the other findings in the report.

Summary of key insights and observations
Logged-in users AB Test

  • We observed an average 28.9% increase in the search sessions initiated across early adopter wikis in the AB test; however, there is not sufficient evidence to definitively say that the new search widget led to this increase in observed search sessions initiated.
  • Results varied on a per wiki basis, with some observed increases and decreases in search sessions initiated between the two test groups.
    • For 7 out of the 12 early adopter wikis, there was an increase in the number of search sessions initiated in the new search widget test group.
    • Most increases ranged from about 12% to 22% but search sessions initiated were more than double on Serbian (+108.12%) and Hebrew Wikipedia (+192.37%).
    • The highest decreases were seen on German WikiVoyage (-52%) and Bengali Wikipedia (-20%); however, both of these wikis had under 100 search sessions recorded for each search widget type during the AB test
  • There were no significant differences in search sessions initiated based on the user's edit count bucket indicating that a user's editing experience is not a factor in the likelihood of them starting a search.

Logged-Out Users Pre and Post Deployment Analysis

  • We observed an 8% overall increase (13.4% median increase across early adopter wikis) in search sessions initiated by logged-out users on the early adopter wikis following deployment of the new search widget. Note that since a controlled experiment was not conducted on logged-out users, we are unable to conclude that this observed increase was due to the new search widget.
  • Results observed on each early adopter wiki varied, ranging from a -21.86% decrease on Persian Wikipedia to a 240.2% increase on Korean Wikipedia.

Search Sessions Completed Errors

During the search sessions completed analysis, we identified a regression where clicks to the search button (by hitting the 'Enter' key or clicking the search icon in the widget) were not recorded for the new search widget (See T274869#7076438). As an alternative, we reviewed two possible approaches: (1) proportion of sessions where event.action = 'visitPage' were recorded and (2) proportion of sessions that included a direct click to a search result. We found errors in both approaches, which are described below:

(1) Direct Clicks to a Search Rendered Result:

This does accurately reflect the impact of the new widget on search session completion rate as it does not account for sessions where users clicked a result by hitting the enter button or the search icon and it's not clear if the change in the search widget. For example, it's possible that the new search widget (which moved the location of the search icon button from the right of the widget to the left) led to more users selecting the search icon instead of clicking a result which may contribute to this decrease.

(2) Visit Page Events:

According to the schema documentation, these events are created after a user clicks a link in the results; however, it looks like this event is only sent after a user is either directed to the Special:Search page after clicking a search rendered result or if a user clicks a search result on the Special:Search page. It does not appear to be logged if a user clicks on one of the autocomplete search rendered results that appear as you start typing in the search widget and are then taken directly to the article.

Codebase

hi @ovasileva! Checking on this task. Let me know if you have any questions or if this can be resolved for now.