Integrate "did you mean" data collection into search satisfaction schema
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EBernhardson
	Jun 17 2016, 3:51 PM

Description

We currently have an independent schema for collecting information about usage of the did you mean feature in search. This should be integrated into the search satisfaction data collection to get all the data in one place.

Details

	Subject	Repo	Branch	Lines +/-
	Integrate did you mean collection into search satisfaction	mediawiki/extensions/WikimediaEvents	master	+160 -149

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		EBernhardson	T138087 Integrate "did you mean" data collection into search satisfaction schema
		Resolved		mpopov	T144424 Add a PaulScore approximation to discovery.wmflabs.org

Event Timeline

EBernhardson created this task.Jun 17 2016, 3:51 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 17 2016, 3:51 PM

debt triaged this task as Medium priority.Jul 20 2016, 4:10 PM

debt moved this task from needs triage to This Quarter on the Discovery-Search board.

debt moved this task from This Quarter to Current work on the Discovery-Search board.Aug 30 2016, 10:10 PM

debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.

• ksmith renamed this task from Integrate did you mean data collection into search satisfaction schema to Integrate "did you mean" data collection into search satisfaction schema.Aug 30 2016, 10:10 PM

The main idea here is to record:

Was the query auto-magically rewritten in the backend due to zero results on the initial query
Record when the user clicks did you mean. Currently we only record that two queries were performed, but not that the second query was a query decided by the did you mean
Might also be nice to record if a search result page has a did you mean suggestion shown to the user.

The main idea here is to record:

Was the query auto-magically rewritten in the backend due to zero results on the initial query
Record when the user clicks did you mean. Currently we only record that two queries were performed, but not that the second query was a query decided by the did you mean
Might also be nice to record if a search result page has a did you mean suggestion shown to the user.

EBernhardson claimed this task.Sep 6 2016, 10:39 PM

EBernhardson moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

@mpopov wondering if you have any preferences, after looking this over i'm thinking of the following:

I suppose first we have to think about what it is we are measuring. I think the overall goal is to be able to determine how satisfied a user is with their suggested search query and be able to more directly measure improvements to the suggestions we provide. One example could be that we currently only use the query rewrite capability when the query has 0 results. We did this because "some results is better than none", but didn't have a way to measure if performing this style of rewrite on queries with few results would be any better.
Possible ways a user can interact with did you mean:

Searches for something that provides results, but also has a did you mean suggestion[1]
Searches for something that has no results, and we internally rewrite into a did you mean suggestion that has results[2]
Searches for something that has no results, and we internally rewrite into a did you mean suggestion that has results[2], then the user clicks the Search instead for '<original query>' which currently always returns 0 results (but could in the future do something different if we change the requirements for when/how we rewrite the query, for example perhaps instead of a total rewrite we merge the results internally with (original) OR (rewritten^0.5) or some such on queries that have few results instead of only no results)
Searches for something that has no results, we internally rewrite into a did you mean suggestion and that has no results[3]
Searches for something that has no results, we internally rewrite into a did you mean suggestion that has results[2], user clicks the Showing result for 'xyz' link[4] and is presented with search results plus a new suggestion[5]

[1] https://en.wikipedia.org/wiki/?search=tayps&fulltext=1
[2] https://en.wikipedia.org/wiki/?search=weldng+defacts&fulltext=1
[3] https://en.wikipedia.org/wiki/?search=tayps+of+wlding+difacts&fulltext=1
[4] https://en.wikipedia.org/wiki/?search=weldng+defacts&fulltext=1
[5] https://en.wikipedia.org/wiki/?search=wedding+defects&fulltext=1

Based on this list of possible ways a user can interact with the feature, I'm thinking we handle the following events:

When a user clicks the did you mean suggestion
- Log a click event, same code path as currently used. the position field will be null, and the inputLocation field, which previously was only used for autocomplete, will contain didyoumean
When a user arrives at a search result from a did you mean click
- Log a searchResultPage event, same code path as currently used. This feels a bit odd, but it seems to make sense to use the inputLocation field again with the didyoumean value.
When a user arrives at a search result with an internally rewritten did you mean, basically when the original query had no results and we instead presented the user with results to the rewritten query
- Log a searchResultPage event, same code path as currently used. This also feels a bit odd, but does it make sense to use the inputLocation field yet again with didyoumean-internal value?
I'm still thinking about how to handle the case of the user clicking links to the original or rewritten query on the result page that was already rewritten, but will likely follow something along the lines above.

Alternatively can not re-use the inputLocation field and add some new field. The inputLocation does in some ways feel like it captures the intent here though, although perhaps only partially.

HI @mpopov - are you ok with the above note by @EBernhardson ?

gotten most of the way there, have tests for several of the cases and they are passing. I realized i likely still need to add a field to the schema though, as I have no way to indicate what kind (incl not at all) of did you mean is being shown on the search result page

Change 311654 had a related patch set uploaded (by EBernhardson):
Integrate did you mean collection into search satisfaction

https://gerrit.wikimedia.org/r/311654

gerritbot added a project: Patch-For-Review.Sep 20 2016, 5:59 AM

EBernhardson moved this task from not in use - please delete to Needs review on the Discovery-Search (Current work) board.Sep 20 2016, 6:00 AM

debt added a subtask: T144424: Add a PaulScore approximation to discovery.wmflabs.org.Oct 7 2016, 7:51 PM

debt mentioned this in T144424: Add a PaulScore approximation to discovery.wmflabs.org.

Change 311654 merged by jenkins-bot:
Integrate did you mean collection into search satisfaction

https://gerrit.wikimedia.org/r/311654

ReleaseTaggerBot added a project: MW-1.28-release (WMF-deploy-2016-10-25_(1.28.0-wmf.23)).Oct 13 2016, 12:00 PM

debt closed this task as Resolved.Oct 14 2016, 9:43 PM

debt closed subtask T144424: Add a PaulScore approximation to discovery.wmflabs.org as Resolved.Oct 28 2016, 2:36 PM

• Deskana moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Dec 14 2016, 5:38 PM