[SPIKE] Search instrumentation for a/b testing of search widget move
Closed, ResolvedPublic
Actions

Description

Background

We would like to measure the effects of search changes in two parts; first - moving the search to a more prominent location within the header of the page, second - updating the functionality of the search widget. This task is created to explore previous search instrumentation and gauge whether this instrumentation can be reused for these purposes

Metrics

Search sessions initiated
Search sessions shown (search results shown to the user)
Search sessions completed

Questions

Can we reuse https://meta.wikimedia.org/wiki/Schema:Search?
Is it possible to structure an a/b test for:
Moving search to the header
Swapping out the search widgets

Note: if not, we'll probably have to do comparative analysis from before/after the change and/or on wikis with similar search patterns. Also, since the current schema is deactivated, we might be unable to get yoy comparisons

Related Objects
Search...

Status	Assigned	Task
Open	None	T49145 Formally deprecate jQuery UI after we've stopped using jQuery UI in extensions and core
Open	None	T100270 Replace use of jQuery UI and MW UI with OOUI across all Wikimedia-deployed extensions and core
Open	None	T85394 Use OOUI suggestions/autocompletion components only (instead of jquery.suggestions, jquery.ui.autocomplete)
Open	None	T125725 [epic] Update autocomplete search box with metadata and remove and delete the old searchSuggest system
Open	None	T177251 Dead keys prevent autocomplete in search box
Resolved	ovasileva	T244392 [GOAL] Deploy the new Vue.js search experience
Resolved	ovasileva	T263032 Deploy the new location of the search bar to new vector and begin A/B test on test wikis
Resolved	ovasileva	T262207 Deploy new search location and DOM order to officewiki and testwiki
Resolved	ovasileva	T249363 Move the existing search to the header in preparation for Vue.js search development
Resolved	MNeisler	T251740 [SPIKE] Search instrumentation for a/b testing of search widget move

Event Timeline

ovasileva created this task.May 4 2020, 10:03 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 4 2020, 10:03 AM

ovasileva triaged this task as Medium priority.May 4 2020, 10:04 AM

ovasileva updated the task description. (Show Details)

ovasileva added a project: Desktop Improvements (Vector 2022).May 4 2020, 10:11 AM

Can we reuse https://meta.wikimedia.org/wiki/Schema:Search?

Yes. However, that instrumentation hasn't been active since before October 2019 (see T233614#5566168) and I'm not sure where the implementation was so it'd be non-trivial to resurrect it.

Schema:MobileWebSearch, a mobile-specific clone of that schema, is still active. We could clone and modify the implementation to emit Search events.

FYI both Schema:Search and Schema:MobileWebSearch will also allow you to answer "How many search sessions are started per browser session?"

phuedx claimed this task.May 4 2020, 5:25 PM

Is it possible to structure an a/b test for:
Moving search to the header

Moving the search widget to the header necessarily requires changes in HTML structure and associated styles. The complexity of the experimental setup depends on whether we limit the scope of our experiment:

Logged-in users only

Yes. Since requests from logged-in users always served by the application servers, we're free to send buckets of users different treatments with different HTML and different styles and have them be fresh.

We can assume that the user's ID is randomly distributed and therefore bucketize them based on it, e.g.

$featureManager->registerFeature( 'NewSearchTreatment', $user->isLoggedIn() && $user->getID() % 2 === 0 );

All users

Yes but it won't be as simple as the above. Requests from logged-out users are mostly served by the edge caches. In the best case a response is fresh but in the worst case a response can be 4 days old (see https://wikitech.wikimedia.org/wiki/Varnish#TTL). We could either:

Modify the client-side code to bucket the user and then render the appropriate search widget treatment. Delays and flashes of content may alter the way users interact with the widget(s) and so will need to be minimised
Enable the instrumentation for all users for one week to establish a baseline. Afterwards, enable the new treatment for all users for two weeks – enough time for all users to be delivered the new treatment and to account for variations in behaviour across the week

Swapping out the search widgets

While moving the search widget does require changes to HTML structure and associated styles, swapping out the implementation and/or treatment of the widget doesn't. Moreover, the widget executes only on the client.

We're free to bucket all users and deliver the associated treatment.

I'd be happy to go into more technical detail if others feel that it's necessary.

• Niedzielski added a project: Design-Systems-team-20200324-20220422 (Vue.js Search Experience (Vector modern)).May 6 2020, 2:19 PM

I think this task has some potential overlap with T249366 so it may be worth linking.

As I understand it, this task focuses on evaluating the move of search (before and after) in the header. This may require changes to the existing JavaScript experience.

T249366 focuses on the new Vue JavaScript search form only.

• Niedzielski moved this task from Backlog to On Web Board on the Design-Systems-team-20200324-20220422 (Vue.js Search Experience (Vector modern)) board.May 11 2020, 3:10 AM

@Niedzielski - apologies for opening another task for this. We can merge them and write up some notes on how we would a/b test the swapping of the widgets or keep that open to address later on - as the new search form is JS only, I think the second question is more straightforward already for both logged-in users and anons so either approach seems fine to me. i.e. do we need a spike on how to do that or can we just go directly into the details of implementation with some analysis? If not, I think we can just go directly create the following tasks:

Set up the schema itself as a desktop clone of Schema:MobileWebSearch - metrics are the same in both
a/b test of the move
analysis of the move
a/b test of the widget
analysis of the widget

@Niedzielski, @phuedx, @Mayakp.wiki, @MNeisler - does that sound reasonable (and sorry for the mass ping!)?

ovasileva removed a subscriber: Stephen.May 11 2020, 11:23 AM

This is fuzzy to me but here's my understanding of this task from the description and @phuedx's comments above.

These are the A/B tests wanted:

Moving the search to a more prominent location within the header of the page.
Updating the functionality of the search widget.

That means that for each test, users are bucketed both in terms of tracking and in terms of implementation. The bucketing will happen in Vector. Sam has added details on how to bucket logged-in users that makes sense to me. The all users test looks more involved unless we go with option 2 mentioned in the comment.

It's my understanding that Sam is suggesting a clone of Schema:MobileWebSearch as the schema. Within that schema, I'm assuming the mapping of metrics to requirements is:

Search sessions initiated => "session-start"
Search sessions shown (search results shown to the user) => "impression-results"
Search sessions completed => "click-result" or "hide-search-suggestions"

Schema:Search looks pretty similar. I wasn't sure if Sam was suggesting cloning Schema:MobileWebSearch instead because of active policy or another reason.

All of this is reported for JavaScript only.

The are the questions and work I'm aware of:

Vector needs to be modified to bucket users and swap the implementations for both the movement and widget tests.
Schema:MobileWebSearch needs to be cloned so that the new search and the old search can use it?
Is the old search currently instrumented with any schema though? I haven't looked.
The new search doesn't exist yet so it will need the instrumentation. I think T249366 could be revised to focus on the instrumentation in the new implementation.
Do we need to evaluate the different search API requests? The old search will be using the MediaWiki Action API, the new search will be using the Core Platform Team's REST API.
Do we need any X-Analytics headers for skin version (e.g., 1 (or not present) = Legacy, 2 = Latest) or search version (e.g., search=1 and search=2)? It's possible that a test wiki may have the latest version of the skin deployed but the new Vue.js search disabled) at least initially.

a/b test of the move
analysis of the move
a/b test of the widget
analysis of the widget

Makes sense. The A/B tests will take some time to run and will need to be analyzed by @Mayakp.wiki and @MNeisler.

The are the questions and work I'm aware of:

Vector needs to be modified to bucket users and swap the implementations for both the movement and widget tests.

Schema:MobileWebSearch needs to be cloned so that the new search and the old search can use it?

Is the old search currently instrumented with any schema though? I haven't looked.

No, there is https://meta.wikimedia.org/wiki/Schema:Search but it's been disabled for a while so if we decide to only a/b test logged in users for the move we'll have to build up the schema at least a little bit ahead of time to get a baseline for anons.

The new search doesn't exist yet so it will need the instrumentation. I think T249366 could be revised to focus on the instrumentation in the new implementation.

Makes sense, let's do that.

Do we need to evaluate the different search API requests? The old search will be using the MediaWiki Action API, the new search will be using the Core Platform Team's REST API.

I guess that would be covered by the a/b test on the new vs old search? Or do you mean something more specific?

Do we need any X-Analytics headers for skin version (e.g., 1 (or not present) = Legacy, 2 = Latest) or search version (e.g., search=1 and search=2)? It's possible that a test wiki may have the latest version of the skin deployed but the new Vue.js search disabled) at least initially.

That's a good point. @Mayakp.wiki, @MNeisler - any thoughts on this?

a/b test of the move
analysis of the move
a/b test of the widget
analysis of the widget

Makes sense. The A/B tests will take some time to run and will need to be analyzed by @Mayakp.wiki and @MNeisler.

I'll set these up as a baseline and we can create the remainder as details become clearer.

Is the old search currently instrumented with any schema though? I haven't looked.

No, there is https://meta.wikimedia.org/wiki/Schema:Search but it's been disabled for a while so if we decide to only a/b test logged in users for the move we'll have to build up the schema at least a little bit ahead of time to get a baseline for anons.

Apologies! I am fully confused so I've re-read the above and started digging into the implementations. Here is my current understanding:

Schema:MobileWebSearch was copied from Schema:Search. Since Schema:MobileWebSearch is active, it's wanted over Search? I didn't understand the reasoning. I'm also unsure if that means just resynchronizing any differences in Schema:Search or making yet another search schema.
Schema:MobileWebSearch has a client JavaScript implementation. Sam is suggesting(?) that we copy that and instead of pumping out schemaMobileWebSearch events, we do Search events? Is that for the new implementation or the old implementation or both? Is this because EventLogging is currently wired to use the Search schema (the revision ID of the previous version of the Schema:Search matches)? I do not see any event.Search events currently emitted.
I think the old search implementation actually uses searchSatisfaction. Should we use that instead? I'm very confused!

The new search doesn't exist yet so it will need the instrumentation. I think T249366 could be revised to focus on the instrumentation in the new implementation.

Makes sense, let's do that.

I've tweaked the task to focus on the new implementation only. Further edits welcome!

Do we need to evaluate the different search API requests? The old search will be using the MediaWiki Action API, the new search will be using the Core Platform Team's REST API.

I guess that would be covered by the a/b test on the new vs old search? Or do you mean something more specific?

I wasn't very clear. I meant that it's my understanding that request traffic can be monitored. I wasn't sure if there were requirements around that for new or old APIs.

Is the old search currently instrumented with any schema though? I haven't looked.

I think the old search implementation actually uses searchSatisfaction. Should we use that instead? I'm very confused!

Yes, desktop search is currently instrumented with searchSatisfaction, which appears to still be active and recording a high number of events. It’s a little more complex than we need but includes all the fields we'd need to calculate the proposed metrics.
Per discussions with @ovasileva today, I'll follow up to the maintainers of this schema to see if there are any issues with us reusing this or if they recommend making a clone of MobileWebSearch instead.

Do we need any X-Analytics headers for skin version (e.g., 1 (or not present) = Legacy, 2 = Latest) or search version (e.g., search=1 and search=2)? It's possible that a test wiki may have the latest version of the skin deployed but the new Vue.js search disabled) at least initially.

I think incorporating a field into the search schema noting which skin version is being used would be sufficient for calculating the metrics we are interested in here. The X-analytics headers could be used to obtain some view-based metrics for each skin type but querying that data from webrequest is time-intensive and minimally useful due to the 90-day data retention guidelines (unless there is another use case I'm not thinking of).

@MNeisler - would it be okay if I assigned this to you for the time being? It seems the open questions are:

Should we duplicate the mobile schema or can we use searchSatisfaction?
If we're using search satisfaction, can we add a field for skin version?

ovasileva assigned this task to MNeisler.May 18 2020, 12:50 PM

MNeisler added a project: Product-Analytics.May 18 2020, 2:20 PM

LGoto edited projects, added Product-Analytics (Kanban); removed Product-Analytics.May 18 2020, 4:12 PM

LGoto moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

ovasileva moved this task from Incoming to Q3 2019-2020 on the Desktop Improvements (Vector 2022) board.May 20 2020, 9:44 AM

ovasileva moved this task from Q3 2019-2020 to Q4 2020 on the Desktop Improvements (Vector 2022) board.

Met with @mpopov yesterday about the status of searchSatisfaction .

Summary of notes below:

Confirmed that searchSatisfaction is active and maintained by the Search platform team.
It includes a number of features that might be useful for our analysis, such as:
- In addition to the mwSessionId, it includes a unique searchSessionId which identifies a user performing searches within a 10-minute timespan and persists longer than the mwSessionId.
- deterministic bucketing for AB tests and ability to specify sampling rates on a per wiki basis.
- scroll tracking
It currently only records events for desktop and non-minerva skins. It was not instrumented on Minerva due to performance concerns. See rEWMV958aad0ebd6d3a69ef342094b9bba94f5de60b1a and https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/master/extension.json#92. It is a large amount of code and if we end up resuing, we will need to make sure any changes will not cause similar performance concerns.

@EBernhardson - Any concerns with the web team using searchSatisfaction to measure the effects of the planned search changes? If we reuse, we would like to add a field to the schema to track skin version if possible.

That shouldn't be a problem.

ovasileva moved this task from Needs Code Review to Ready for Signoff on the Web-Team-Backlog (Kanbanana-2019-20-Q4) board.Jun 10 2020, 5:14 PM

All done, thank you @MNeisler! Followup for the new field is in T256100: Add skin version and search version fields to search satisfaction schema

ovasileva closed this task as Resolved.Jun 23 2020, 7:48 AM

• Niedzielski mentioned this in T249366: [Spike] What should we instrument in the new Vue.js search experience?.Jul 23 2020, 2:48 PM

• Niedzielski mentioned this in T259250: A/B test setup for search changes.Aug 6 2020, 2:26 PM

ovasileva removed a subtask: T256100: Add skin version and search version fields to search satisfaction schema.Aug 31 2020, 1:09 PM

ovasileva mentioned this in T261647: Set up A/B test for new search widget.Aug 31 2020, 2:43 PM

ovasileva mentioned this in T261648: Perform A/B test for search location change.

Jdlrobson mentioned this in T263032: Deploy the new location of the search bar to new vector and begin A/B test on test wikis.Sep 17 2020, 1:13 AM