This ticket is to be used to track the launch of the MVP test for search relevance that will be graded by humans (users of pre-defined search queries).
|Invalid||None||T174064 [FY 2017-18 Objective] Implement advanced search methodologies|
|Resolved||Gehel||T171740 [Epic] Search Relevance: graded by humans|
|Resolved||debt||T171741 Search Relevance: MVP test (turned on)|
This deployed today, and I'm seeing some events in the raw eventlogging input (eventlogging-client-side kafka topic). Unfortunately i'm not seeing the data make it past that stage into the specific eventlogging_HumanSearchRelevance topic. That means it won't end up in mysql for analysis either. Still investigating whats wrong there.
Going to be a little bit before we can make any judgement about the quality, but in terms of some very basic numbers these are the responses we have seen so far:
So at least some people are answering with different answers, and as expected the majority are allowing for it to timeout.
Hmmm interesting first results. I must admit, it's more than I had really expected so soon after the test was launched, since we're using canned queries. But...at least it's 'on and working' and now we just need more info on which topics they were responding to. :)
I know some of the articles that got the "no" answers!
since we're using canned queries
Hopefully the queries being artificially constructed won't be too much of a problem. The articles they are attached to are the real articles search returns, so they aren't terrible.
I can't help but peek to see if things are going well. Not really judging responses per say, but in terms of actual response levels. We are running 4 variations of the question, and there have been 1550 impressions so far (371 to 405 per question):
a: Would you click on this page when searching for '$1'? b: If you searched for '$1', would this article be a good result? c: If you searched for '$1', would this article be relevant? d: If someone searched for '$1', would they want to read this article?
Unfortunately due to a bug we arn't collecting the 'unsure' answers, but that will be fixed monday. The unsure results below i manually extracted from the raw logs, rather than from the mysql table that represents the results.
Which is better is of course unclear, and will require more data and some statistical analysis, but it's interesting that question d gets more responses. More responses might not be better though it the responses have more noise.