Page MenuHomePhabricator

Search Relevance: MVP test (turned on)
Closed, ResolvedPublic

Description

This ticket is to be used to track the launch of the MVP test for search relevance that will be graded by humans (users of pre-defined search queries).

Event Timeline

This deployed today, and I'm seeing some events in the raw eventlogging input (eventlogging-client-side kafka topic). Unfortunately i'm not seeing the data make it past that stage into the specific eventlogging_HumanSearchRelevance topic. That means it won't end up in mysql for analysis either. Still investigating whats wrong there.

Going to be a little bit before we can make any judgement about the quality, but in terms of some very basic numbers these are the responses we have seen so far:

countchoice
1"dismiss"
32"no"
108"timeout"
23"unsure"
10"yes"

So at least some people are answering with different answers, and as expected the majority are allowing for it to timeout.

Change 370108 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370108

Change 370108 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370108

Change 370110 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.12] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370110

Change 370110 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.12] Update HumanSearchRelevance schema rev id

https://gerrit.wikimedia.org/r/370110

Hmmm interesting first results. I must admit, it's more than I had really expected so soon after the test was launched, since we're using canned queries. But...at least it's 'on and working' and now we just need more info on which topics they were responding to. :)

I know some of the articles that got the "no" answers!

since we're using canned queries

Hopefully the queries being artificially constructed won't be too much of a problem. The articles they are attached to are the real articles search returns, so they aren't terrible.

I can't help but peek to see if things are going well. Not really judging responses per say, but in terms of actual response levels. We are running 4 variations of the question, and there have been 1550 impressions so far (371 to 405 per question):

a: Would you click on this page when searching for '$1'?
b: If you searched for '$1', would this article be a good result?
c: If you searched for '$1', would this article be relevant?
d: If someone searched for '$1', would they want to read this article?

Unfortunately due to a bug we arn't collecting the 'unsure' answers, but that will be fixed monday. The unsure results below i manually extracted from the raw logs, rather than from the mysql table that represents the results.

choiceabcd
dismiss0.27%1.48%1.06%0.00%
no16.17%13.58%11.64%16.37%
timeout71.16%66.67%70.63%61.21%
yes4.04%8.15%5.82%10.58%
unsure8.36%10.12%10.85%11.84%

Which is better is of course unclear, and will require more data and some statistical analysis, but it's interesting that question d gets more responses. More responses might not be better though it the responses have more noise.

Change 370531 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Update relevance survey to 60s, and bump schema rev id

https://gerrit.wikimedia.org/r/370531

Change 370531 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Update relevance survey to 60s, and bump schema rev id

https://gerrit.wikimedia.org/r/370531

debt claimed this task.

This test was turned on Aug 8, 2017.