Page MenuHomePhabricator

Basic user testing for new search experience
Closed, ResolvedPublic

Description

Description

Sanity check the new search experience

Goals

  • To see if there are any major usability issues that we may have overlooked
  • To see the breakdown in terms of how people submit their search:
    • (a) click on a suggested result
    • (b) press enter on their keyboard
    • (c) click the Search button
  • To see what people think the “Search” button will do
  • To see what people think the “Search pages containing X” will do

Study

On usertesting.com we had people do various tasks, some of which required them to search for various articles (without explicitly telling them to use the search box). There were two groups:

  • Group 1
    • 17 people
    • People in group 1 searched for Egypt, Cave art, Banana, and Purple
  • Group 2
    • 15 people
    • People in group 2 searched for Electricity, Banana, Willow trees, Romeo and Juliet, and Purple

There was a mix of ages and geographies in both groups:

age breakdowngeography breakdown
Screen Shot 2020-08-28 at 4.28.59 PM.png (568×934 px, 25 KB)
Screen Shot 2020-08-28 at 4.33.15 PM.png (614×997 px, 32 KB)

Findings

  • 0 people had issues using search
  • There were a total of 117 searches submitted:
    • 86 were submitted via a suggested result
    • 19 were submitted via the enter key
    • 12 were submitted via the Search button
  • Regarding what people thought clicking the Search button would do:
    • 16 of 23 people who answered thought it would take them to the first result
    • 7 of 23 people who answered thought it would take them to a list of results
    • note: all people assumed that enter and the Search button do the same thing
  • Regarding what people thought clicking the Search for pages containing Purple would do:
    • 25 of the 25 people who answered thought it would take them to a list of pages that have the word "purple" in them

Notes:

  • in the first study people started on the Pancake article and were then asked to go to the article about Egypt. It was ambiguous how they were supposed to get there. 7 out of 10 people tried to find an "Egypt" link within the Pancake article
  • one person wondered why the Cave art search result didn't have an image — made me wonder if the articles that don't have images look lower quality next to the ones that do?
  • in two cases the search results loaded quite slowly and the people used the Search button
  • few people used their keyboard to navigate down to suggested search results (maybe 2 or 3)

Event Timeline

@ovasileva @RHo please see the test results in the description. Nothing surprising here.

Regarding the breakdown of how people submitted their searches, it was a bit different than the actual data we got in T259766#6400566 which was 51% suggested search results, and 48% search button/enter key. In the case of the test it was 86 of 117 (or 73%) of searches submitted via a suggested search result Of course our study is way too small to draw any conclusions but it made me wonder: do we know anything about the current search experience in terms of do-over searches (made up term)? Like do people type "Prince" and hit enter, land on the page about Princes but actually want the page about Prince the musician, then do a second search to get there? Since the search results are now larger, and have images and descriptions, maybe people will be more likely to click them (vs using search button/enter key). If so, perhaps there will be fewer cases where people do do-over searches, as in doing a second search when their first search didn't take them where they wanted to go. From a data perspective this might actually look like people are searching less(?), so I wonder if we want to:

  1. make sure that we're not counting do-over searches in the total # of searches conducted that we're using in our A/B tests
  2. somehow try to track if the new search experience results in people getting to the page they were looking for more often (i.e. less do-over searches)
In T261515#6420028, @alexhollender wrote:

@ovasileva @RHo please see the test results in the description. Nothing surprising here.

Regarding the breakdown of how people submitted their searches, it was a bit different than the actual data we got in T259766#6400566 which was 51% suggested search results, and 48% search button/enter key. In the case of the test it was 86 of 117 (or 73%) of searches submitted via a suggested search result Of course our study is way too small to draw any conclusions but it made me wonder: do we know anything about the current search experience in terms of do-over searches (made up term)? Like do people type "Prince" and hit enter, land on the page about Princes but actually want the page about Prince the musician, then do a second search to get there? Since the search results are now larger, and have images and descriptions, maybe people will be more likely to click them (vs using search button/enter key). If so, perhaps there will be fewer cases where people do do-over searches, as in doing a second search when their first search didn't take them where they wanted to go. From a data perspective this might actually look like people are searching less(?), so I wonder if we want to:

  1. make sure that we're not counting do-over searches in the total # of searches conducted that we're using in our A/B tests
  2. somehow try to track if the new search experience results in people getting to the page they were looking for more often (i.e. less do-over searches)

@MNeisler - thoughts on this? I think it would be tricky to define what a do-over search is - I guess we can set some sort of time limit for doing multiple searches with the same search term and look at that? Does the current instrumentation allow for that?

@MNeisler - thoughts on this? I think it would be tricky to define what a do-over search is - I guess we can set some sort of time limit for doing multiple searches with the same search term and look at that? Does the current instrumentation allow for that?

@ovasileva @alexhollender
It's possible to track do-over searches as described above but it would be complex and time-intensive to calculate using current instrumentation. We could review search sessions that follow a specific sequence of search actions and have partial search term matches. I'm not sure reviewing just full search term matches in a single search session would provide an accurate estimate - For example, if the person types "Prince" and hits enter, the search query would be recorded as "Prince" but when the person does a second search, they might just start typing "Pri" and then select the desired autocomplete result as soon as it appears. The search query this time would be recorded just as "Pri".

Let me know if you think is important to know for the search experience design and I can create a ticket to investigate some options further.

@MNeisler @ovasileva
Ok, well currently this is totally speculative on my part and I think would only be worth investigating if we think do-over searches are somewhat common. @TJones any chance y'all have thought about, or know about, do-over searches (see above comments for context)?

In T261515#6430489, @alexhollender wrote:

@TJones any chance y'all have thought about, or know about, do-over searches (see above comments for context)?

We have looked at query reformulation within the same session, but only for full text search, not search from the "go" feature.

  • The "go" feature is the query that is currently in the upper corner, and which, in the last version I saw, you had moved to the middle of the top of the screen. We generally think of those as successful searches, on the assumption that if people choose a suggestion they are getting what they want—but your example with typing Prince and hitting enter is a good potential counter example.
  • We have defined search sessions as the same IP, user agent, etc. with a gap between queries of less than 10 or 20 minutes (I think we've used both). So, if someone walks away from their computer for an hour and comes back, that's definitely a new session, but searching every 5 minutes for an hour is one session.
  • Accurate detection of query reformulation (which sounds like your "do-overs") is hard. In the "go" box it's even harder. If you type slowly (and poorly) but have a fast internet connection you could log all of the following queries to the completion suggester: p, pr, pri, pric, pricn, pricne, pricn, pric, princ, prince. If you only log what's in the text box when they hit enter or select a suggestion, it's still hard, especially because of the priPrince (musician) example above.
    • "Real" reformulation happens when there's some textual similarity between queries—pricne and prince or dogsled and dog sled or soccer finals and football finals. Note that small edits like pricne and prince don't make as much sense in Chinese, though, where changing one or two characters can completely change the meaning. So, for machine learning purposes, you can just count all queries in one session as "reformulations" even when they aren't, and the ones that recurr often are probably the "real" ones and you can learn from those. For full-text A/B tests, fewer reformulations probably means that searches are more successful, because there are fewer "real" reformulations... but you never get rid of people searching for totally different things in the same session. I think you could similarly interpret the final text in the "go" box. (Though A/B tests in search are always noisy.)

@EBernhardson has more experience with all of this, but that should cover most of the basic stuff we've thought about.