Page MenuHomePhabricator

Track clicking on suggestion in "Did you mean" results
Closed, ResolvedPublic


This is a follow on to T105201, separated out in part so I can get some experience working on this.

We can't currently track whether someone got to a particular results page because their query didn't warrant suggestions, or because they clicked on a suggestion. Some people (like me) will click on the suggested query even though it gives the same results in order to "clear" the error from the screen. (It's distracting and reminds me how poorly I type.)

In the limit, this could skew our stat. Imagine 100 queries, 50 with suggestions. That's a 50% suggestion rate. If all 50 users clicked on the suggested query, then we'd have 150 queries, 50 with suggestions, for a 33% suggestion rate. The effect in real life won't be that big, but we can measure it instead of guessing.

We should also be able to find queries that give suggestions that in turn generate additional suggestions (if someone clicks on them)—which is interesting, but might also give insight in how to improve suggestions.

Event Timeline

TJones claimed this task.
TJones raised the priority of this task from to Needs Triage.
TJones updated the task description. (Show Details)
TJones added a subscriber: TJones.
TJones triaged this task as High priority.Aug 28 2015, 11:34 PM

After seeing what Erik did and talking to Max to make sure I don't do anything that's obviously bad for caching, I'm planning to add a variant of wprov (=cirrusDYM) to the links in div.searchdidyoumean (I don't see any other good way to distinguish them from other links on the page). The presence of that query parameter will be logged in the DidYouMean schema. The values of wprov are mutually exclusive as they are added in the same function.

As a general rule I find it best not to commit code on a Friday afternoon, so I'll look this over Monday and submit for review then.

Change 235049 had a related patch set uploaded (by Tjones):
Track clicking on suggestion in "Did you mean" results

Over in gerrit, @EBernhardson said:

I am left wondering, there are two possible DYM messages:

Did you mean: <a>suggestion</a>
Showing results for <a>suggestion</a>. Search instead for <a>query</a>

All of those anchors are captured with the ... selector, and unfortunately there is not currently a way to explicitly distinguish the three. Each of these anchors covers a completely different user intent. Re-reading the ticket we might want to just not track the second message, and only track clicks to the first message. On the other hand we might want to keep track of all the links...but that feels like redefining the ticket so maybe just investigate later.

Do we need to distinguish the two kinds of suggestions links? (Did you mean X? and Showing results for X?) If you click on the query, runsuggestion is false (T105201), and that distinguishes it from the others. Right now we're trying to distinguish clicking on suggestions from running a fresh query, so I think we're okay.

Opinions to the contrary welcome.

@Ironholds @mpopov any opinion above? This patch is about ready to ship but blocked on the above distinction.

I'm inclined to agree with Erik, but need to have someone clarify something first. When the user clicks [suggestion] link in the "Showing results for…" message, does the client hit the search engine again or does the user get cached results?

@EBernhardson, I can't definitively answer @mpopov's question, but it sure looks like we hit the search engine again.

When the user clicks [suggestion] link in the "Showing results for…" message, does the client hit the search engine again or does the user get cached results?

If we're leaning toward separating these things out I want to figure out the right way to do it.

@mpopov, on the logging side, how do you want the event log to look? Do you want one key with three possible values (indicating offered a suggestion, ran a suggestion, and offered the original), or some other mix of attributes?

Also, I've noticed in my tests that as things stand now, a click results in two log events—one for the click, and one for the page load. They are different. Is that a problem for anyone?

@EBernhardson, is it acceptable practice to muck about with the HTML being generated on the search page, say, to add some additional classes to the output so we can easily identify things? My previous plan was aimed at minimal intervention, now it looks like maximal distinction is probably better. Is that the way to do it (rather than counting anchor links and the like)?

@TJones @EBernhardson if we're hitting the search engine twice (once for "showing results for …" and once for showing those same results) then that's potentially a lot of duplicated data if we track both. I suppose we can't make it a soft a link that 1. changes the URL (and page title) to the suggested query and 2. hides the message. That way it updates the page to use the suggested query but doesn't hit the server again.

We could remove that link (showing results for <a>...</a>) and make it a plain message. i only put it there because it seems (without doing any research) to be industry standard. I have actually clicked that link on google before :). We could certainly throw in some javascript magic so it just changes the query in the search box.

I'm all for having the link there (I have done that on Google too), but my concern is possible duplicated logs that result from the user hitting the servers twice (first time because our system did it for them and second time because they chose to do it themselves). Hence the "what if we just made it seem like it's a new page without actually serving them a new page" JS illusion hack :)

BTW, at the moment, we're logging the click in addition to the page load, so when I said we're logging twice, I meant just for clicking on the suggestion. There's separate logging for the original search and results.

Guys, I don't have strong opinions about what we do, we just have to decide. I am looking to Erik to make sure we don't try to implement anything ridiculous, though (which he's been helpful with so far).

One other use case to consider: rarely, clicking on a correction/suggestion can lead to an additional correction/suggestion. Searching for who is afrad of virginai wolf shows results for who is afraid of virginia wolf, but clicking on that gives another Did you mean suggestion of who is afraid of virginia woolf. In this case, the top result is always the same. I don't have lots of examples of this handy.

If we care about that case, we have to have suggestion links. If not, then we can either:

  • Don't put a link on the "showing results for XX" (super easy)
  • Or make the link perform the JS illusion hack that hides the message, modifies the search box, and logs nothing.

Sounds like we like the illusion hack... though my primary concern with that is cross-browser compatibility—esp. on non-smart mobile phones and such. (We can make the JS smarter and have it just run the current re-query unless the JS runs, and accept some slop there—though I'm not 100% sure how that'll work on a non-smart phone, or how to test.)

We don't really have to worry about mobile here, the mobile frontend has its own implementation of these suggestions and basically never shows the user Special:Search.

I'd vote for the illusion.

Uh... bad news, guys: the JS illusion hack would have to be extra super hacky, and I think it's too hacky.

In addition to the search box, the original re-written query is in the page title, it's in all sorts of links on the page, including the printable version, next 20, show 50, etc., the Multimedia, Everything, and Advanced links, the create-the-page link, the Special Page link, heck it's even in the Create Account and Login links, so that you can get back to the results.

Search for geogre clooney, then search the source for "geogre" to see everywhere it pops up.

Also, if you click on the suggestion, you end up on the George Clooney page, not back at search results!

The illusion is shattered.

So, options are don't show a link (seems rude, now that I see what all it can do for you), or log something.

I think we're back to log something.

Do you want one key with three possible values (indicating offered a suggestion, ran a suggestion, and offered the original), or some other mix of attributes?

Sorry somehow i missed this update. I can see how that would be a problem. I don't have any great solutions here, I'm leaning towards not tracking the special case as it's making things more difficult that necessary.

@mpopov, abandon, or three values, or something else?

There is a lot of apparent redundancy here, so I'm going to re-iterate the explanation here, if only for my own sanity. The schema records:

  • didYouMean—what kind of didYouMean results were served/offered: a rewritten query (with original), or a suggestion, or none of the above
  • didYouMeanSource—this is the new one; how did we get to this page: clicking an original query, a rewritten query, a suggested query, or none of the above.
  • action—how did we leave this page: clicking an original query, a rewritten query, a suggested query, or none of the above.

Change 241098 had a related patch set uploaded (by Tjones):
Add ids to "Did you mean" links so they can be distinguished

Change 241098 merged by jenkins-bot:
Add ids to "Did you mean" links so they can be distinguished

Change 235049 merged by jenkins-bot:
Track clicks on suggestions, etc., in "Did you mean" results