Basic user testing for new search experience
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• alexhollender_WMF
	Aug 28 2020, 5:48 PM

Description

Sanity check the new search experience

Goals

To see if there are any major usability issues that we may have overlooked
To see the breakdown in terms of how people submit their search:
- (a) click on a suggested result
- (b) press enter on their keyboard
- (c) click the Search button
To see what people think the “Search” button will do
To see what people think the “Search pages containing X” will do

Study

On usertesting.com we had people do various tasks, some of which required them to search for various articles (without explicitly telling them to use the search box). There were two groups:

Group 1
- 17 people
- People in group 1 searched for Egypt, Cave art, Banana, and Purple

Group 2
- 15 people
- People in group 2 searched for Electricity, Banana, Willow trees, Romeo and Juliet, and Purple

There was a mix of ages and geographies in both groups:

age breakdown	geography breakdown

Findings

0 people had issues using search
There were a total of 117 searches submitted:
- 86 were submitted via a suggested result
- 19 were submitted via the enter key
- 12 were submitted via the Search button
Regarding what people thought clicking the Search button would do:
- 16 of 23 people who answered thought it would take them to the first result
- 7 of 23 people who answered thought it would take them to a list of results
- note: all people assumed that enter and the Search button do the same thing
Regarding what people thought clicking the Search for pages containing Purple would do:
- 25 of the 25 people who answered thought it would take them to a list of pages that have the word "purple" in them

Notes:

in the first study people started on the Pancake article and were then asked to go to the article about Egypt. It was ambiguous how they were supposed to get there. 7 out of 10 people tried to find an "Egypt" link within the Pancake article
one person wondered why the Cave art search result didn't have an image — made me wonder if the articles that don't have images look lower quality next to the ones that do?
in two cases the search results loaded quite slowly and the people used the Search button
few people used their keyboard to navigate down to suggested search results (maybe 2 or 3)

Related Objects
Search...

Status	Assigned	Task
Open	None	T49145 Formally deprecate jQuery UI after we've stopped using jQuery UI in extensions and core
Open	None	T100270 Replace use of jQuery UI and MW UI with OOUI across all Wikimedia-deployed extensions and core
Open	None	T85394 Use OOUI suggestions/autocompletion components only (instead of jquery.suggestions, jquery.ui.autocomplete)
Open	None	T125725 [epic] Update autocomplete search box with metadata and remove and delete the old searchSuggest system
Open	None	T177251 Dead keys prevent autocomplete in search box
Resolved	ovasileva	T244392 [GOAL] Deploy the new Vue.js search experience
Resolved	Volker_E	T249299 [Epic] Build the new Vue.js search experience
Resolved	• alexhollender_WMF	T255603 Design spec for new Vue.js search experience
Resolved	ovasileva	T261515 Basic user testing for new search experience

Event Timeline

• alexhollender_WMF created this task.Aug 28 2020, 5:48 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 28 2020, 5:48 PM

• alexhollender_WMF added a parent task: T255603: Design spec for new Vue.js search experience.Aug 28 2020, 8:21 PM

• alexhollender_WMF updated the task description. (Show Details)Aug 28 2020, 8:51 PM

@ovasileva @RHo please see the test results in the description. Nothing surprising here.

Regarding the breakdown of how people submitted their searches, it was a bit different than the actual data we got in T259766#6400566 which was 51% suggested search results, and 48% search button/enter key. In the case of the test it was 86 of 117 (or 73%) of searches submitted via a suggested search result Of course our study is way too small to draw any conclusions but it made me wonder: do we know anything about the current search experience in terms of do-over searches (made up term)? Like do people type "Prince" and hit enter, land on the page about Princes but actually want the page about Prince the musician, then do a second search to get there? Since the search results are now larger, and have images and descriptions, maybe people will be more likely to click them (vs using search button/enter key). If so, perhaps there will be fewer cases where people do do-over searches, as in doing a second search when their first search didn't take them where they wanted to go. From a data perspective this might actually look like people are searching less(?), so I wonder if we want to:

make sure that we're not counting do-over searches in the total # of searches conducted that we're using in our A/B tests
somehow try to track if the new search experience results in people getting to the page they were looking for more often (i.e. less do-over searches)

• alexhollender_WMF moved this task from Doing to Design Review on the Web-Team-Backlog (Kanbanana-FY-2020-21) board.Aug 28 2020, 9:02 PM

ovasileva triaged this task as High priority.Aug 31 2020, 8:58 AM

In T261515#6420028, @alexhollender wrote:

@ovasileva @RHo please see the test results in the description. Nothing surprising here.

Regarding the breakdown of how people submitted their searches, it was a bit different than the actual data we got in T259766#6400566 which was 51% suggested search results, and 48% search button/enter key. In the case of the test it was 86 of 117 (or 73%) of searches submitted via a suggested search result Of course our study is way too small to draw any conclusions but it made me wonder: do we know anything about the current search experience in terms of do-over searches (made up term)? Like do people type "Prince" and hit enter, land on the page about Princes but actually want the page about Prince the musician, then do a second search to get there? Since the search results are now larger, and have images and descriptions, maybe people will be more likely to click them (vs using search button/enter key). If so, perhaps there will be fewer cases where people do do-over searches, as in doing a second search when their first search didn't take them where they wanted to go. From a data perspective this might actually look like people are searching less(?), so I wonder if we want to:

make sure that we're not counting do-over searches in the total # of searches conducted that we're using in our A/B tests

somehow try to track if the new search experience results in people getting to the page they were looking for more often (i.e. less do-over searches)

@MNeisler - thoughts on this? I think it would be tricky to define what a do-over search is - I guess we can set some sort of time limit for doing multiple searches with the same search term and look at that? Does the current instrumentation allow for that?

• alexhollender_WMF updated the task description. (Show Details)Aug 31 2020, 1:48 PM

In T261515#6422305, @ovasileva wrote:

@MNeisler - thoughts on this? I think it would be tricky to define what a do-over search is - I guess we can set some sort of time limit for doing multiple searches with the same search term and look at that? Does the current instrumentation allow for that?

@ovasileva @alexhollender
It's possible to track do-over searches as described above but it would be complex and time-intensive to calculate using current instrumentation. We could review search sessions that follow a specific sequence of search actions and have partial search term matches. I'm not sure reviewing just full search term matches in a single search session would provide an accurate estimate - For example, if the person types "Prince" and hits enter, the search query would be recorded as "Prince" but when the person does a second search, they might just start typing "Pri" and then select the desired autocomplete result as soon as it appears. The search query this time would be recorded just as "Pri".

Let me know if you think is important to know for the search experience design and I can create a ticket to investigate some options further.

@MNeisler @ovasileva
Ok, well currently this is totally speculative on my part and I think would only be worth investigating if we think do-over searches are somewhat common. @TJones any chance y'all have thought about, or know about, do-over searches (see above comments for context)?

ovasileva moved this task from Design Review to Ready for Signoff on the Web-Team-Backlog (Kanbanana-FY-2020-21) board.Sep 2 2020, 5:13 PM

In T261515#6430489, @alexhollender wrote:

@TJones any chance y'all have thought about, or know about, do-over searches (see above comments for context)?

We have looked at query reformulation within the same session, but only for full text search, not search from the "go" feature.

The "go" feature is the query that is currently in the upper corner, and which, in the last version I saw, you had moved to the middle of the top of the screen. We generally think of those as successful searches, on the assumption that if people choose a suggestion they are getting what they want—but your example with typing Prince and hitting enter is a good potential counter example.
We have defined search sessions as the same IP, user agent, etc. with a gap between queries of less than 10 or 20 minutes (I think we've used both). So, if someone walks away from their computer for an hour and comes back, that's definitely a new session, but searching every 5 minutes for an hour is one session.
Accurate detection of query reformulation (which sounds like your "do-overs") is hard. In the "go" box it's even harder. If you type slowly (and poorly) but have a fast internet connection you could log all of the following queries to the completion suggester: p, pr, pri, pric, pricn, pricne, pricn, pric, princ, prince. If you only log what's in the text box when they hit enter or select a suggestion, it's still hard, especially because of the pri → Prince (musician) example above.
- "Real" reformulation happens when there's some textual similarity between queries—pricne and prince or dogsled and dog sled or soccer finals and football finals. Note that small edits like pricne and prince don't make as much sense in Chinese, though, where changing one or two characters can completely change the meaning. So, for machine learning purposes, you can just count all queries in one session as "reformulations" even when they aren't, and the ones that recurr often are probably the "real" ones and you can learn from those. For full-text A/B tests, fewer reformulations probably means that searches are more successful, because there are fewer "real" reformulations... but you never get rid of people searching for totally different things in the same session. I think you could similarly interpret the final text in the "go" box. (Though A/B tests in search are always noisy.)

@EBernhardson has more experience with all of this, but that should cover most of the basic stuff we've thought about.

ovasileva claimed this task.Sep 16 2020, 5:06 PM

Now on wiki: https://www.mediawiki.org/wiki/Reading/Web/Desktop_Improvements/Features/Search#User_testing

	F32198869: Screen Shot 2020-08-28 at 4.28.59 PM.png
	Aug 28 2020, 8:51 PM

	F32198875: Screen Shot 2020-08-28 at 4.33.15 PM.png
	Aug 28 2020, 8:51 PM

Basic user testing for new search experienceClosed, ResolvedPublicActions

Description

Description

Goals

Study

Findings

Related ObjectsSearch...

Event Timeline

Basic user testing for new search experience
Closed, ResolvedPublic
Actions

Related Objects
Search...