Search Relevance: MVP test (turned on)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	debt
	Jul 26 2017, 1:38 PM

Description

This ticket is to be used to track the launch of the MVP test for search relevance that will be graded by humans (users of pre-defined search queries).

Details

Subject	Repo	Branch	Lines +/-
Update relevance survey to 60s, and bump schema rev id	mediawiki/extensions/WikimediaEvents	master	+2 -2
Update HumanSearchRelevance schema rev id	mediawiki/extensions/WikimediaEvents	wmf/1.30.0-wmf.12	+1 -1
Update HumanSearchRelevance schema rev id	mediawiki/extensions/WikimediaEvents	master	+1 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T174064 [FY 2017-18 Objective] Implement advanced search methodologies
Resolved	Gehel	T171740 [Epic] Search Relevance: graded by humans
Resolved	debt	T171741 Search Relevance: MVP test (turned on)

Event Timeline

debt created this task.Jul 26 2017, 1:38 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2017, 1:38 PM

EBernhardson moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.Aug 3 2017, 9:33 PM

This deployed today, and I'm seeing some events in the raw eventlogging input (eventlogging-client-side kafka topic). Unfortunately i'm not seeing the data make it past that stage into the specific eventlogging_HumanSearchRelevance topic. That means it won't end up in mysql for analysis either. Still investigating whats wrong there.

Going to be a little bit before we can make any judgement about the quality, but in terms of some very basic numbers these are the responses we have seen so far:

count	choice
1	"dismiss"
32	"no"
108	"timeout"
23	"unsure"
10	"yes"

So at least some people are answering with different answers, and as expected the majority are allowing for it to timeout.

Change 370108 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370108

gerritbot added a project: Patch-For-Review.Aug 3 2017, 10:42 PM

Change 370108 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370108

Change 370110 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.12] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370110

ReleaseTaggerBot added a project: MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)).Aug 3 2017, 11:00 PM

Change 370110 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.12] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370110

ReleaseTaggerBot edited projects, added MW-1.30-release-notes (WMF-deploy-2017-08-01_(1.30.0-wmf.12)); removed MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)).Aug 4 2017, 12:00 AM

Hmmm interesting first results. I must admit, it's more than I had really expected so soon after the test was launched, since we're using canned queries. But...at least it's 'on and working' and now we just need more info on which topics they were responding to. :)

I know some of the articles that got the "no" answers!

since we're using canned queries

Hopefully the queries being artificially constructed won't be too much of a problem. The articles they are attached to are the real articles search returns, so they aren't terrible.

I can't help but peek to see if things are going well. Not really judging responses per say, but in terms of actual response levels. We are running 4 variations of the question, and there have been 1550 impressions so far (371 to 405 per question):

a: Would you click on this page when searching for '$1'?
b: If you searched for '$1', would this article be a good result?
c: If you searched for '$1', would this article be relevant?
d: If someone searched for '$1', would they want to read this article?

Unfortunately due to a bug we arn't collecting the 'unsure' answers, but that will be fixed monday. The unsure results below i manually extracted from the raw logs, rather than from the mysql table that represents the results.

choice	a	b	c	d
dismiss	0.27%	1.48%	1.06%	0.00%
no	16.17%	13.58%	11.64%	16.37%
timeout	71.16%	66.67%	70.63%	61.21%
yes	4.04%	8.15%	5.82%	10.58%
unsure	8.36%	10.12%	10.85%	11.84%

Which is better is of course unclear, and will require more data and some statistical analysis, but it's interesting that question d gets more responses. More responses might not be better though it the responses have more noise.

EBernhardson added a comment.Aug 6 2017, 6:38 AM

This comment was removed by EBernhardson.

Change 370531 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Update relevance survey to 60s, and bump schema rev id

https://gerrit.wikimedia.org/r/370531

Change 370531 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Update relevance survey to 60s, and bump schema rev id

https://gerrit.wikimedia.org/r/370531

ReleaseTaggerBot edited projects, added MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)); removed MW-1.30-release-notes (WMF-deploy-2017-08-01_(1.30.0-wmf.12)).Aug 7 2017, 9:00 PM