Page MenuHomePhabricator

Figure out some hypothetical formula for measuring the user perceived accuracy of full text search and create a plan to implement that including phabricator tasks
Closed, ResolvedPublic4 Story Points

Description

I think this is genuinely hard - there is research to do and we need to adapt whatever best practices we can find to what works for us.

Event Timeline

Manybubbles assigned this task to EBernhardson.
Manybubbles raised the priority of this task from to Normal.
Manybubbles updated the task description. (Show Details)
Manybubbles set Security to None.

Based on our conversation today, it sounds like we should consider a "per-language" rating. For example, it sounds like our rating for "any search for Chinese characters without spaces" probably sucks.

Step 1 is to read all of the papers. That's going to be my initial work on this.

@Ironholds You should do your initial research as a spike: time box it and report back. How long do you think is appropriate? Two days?

Sounds totally workable.

@Ironholds Cool. Let's check on Wednesday about this. I'll send you an invite.

Deskana renamed this task from Figure out some hypothetical formula for measuring the user perceived accuracy of full text search and create a plan to implement that including phabricator tasks to [Spike, 2 days] Figure out some hypothetical formula for measuring the user perceived accuracy of full text search and create a plan to implement that including phabricator tasks.Jun 1 2015, 4:13 PM

+1 for initial research to be done in two-ish days. I suspect this'll be an ongoing thing though. But having a plan soon would be sweet.

Deskana moved this task from Needs triage to Analysis on the Discovery board.Jun 2 2015, 2:32 PM
Ironholds edited a custom field.Jun 2 2015, 4:19 PM

Dan, Wes and Oliver met today to discuss this. Oliver presented a metric that we can use to evaluate whether searches are successful or not.

Next steps:

  1. Oliver (@Ironholds) makes a presentation summarising the discussion we had and the metric he presented, and link that here
  2. Draft a schema to try to measure search success using this metric
  3. Take this schema to the engineers and get them to implement it

Very short notes from talking to oliver about what we want to do here:

did they click on a thing, track next page view. Did they go back (return to SERP).

so, track search for a thing:

  • clicked on a thing
  • went to article A, then B, then C (how far to track?)
  • were they on page long enough to get something of value

in ideal universe, capture search hits:

  • click through to pages
  • clicks out of site
  • goal: time on page linked from search result page
  • needs a uuid for search session

need to create a schema for this

  • search session uuid
  • timestamp
  • event type (search page, result page, click out)

Random extra ideas i think we might want to consider (maybe in later iterations of this, i dunno):

  • track the number of search results that were provided
  • track which page of search results the user is clicking from
  • track the position (i.e. 4th link in page) of the search link that was clicked

Absolutely! This is just a preliminary "can we get signal from this data" kinda thing :)

EBernhardson added a comment.EditedJun 8 2015, 10:14 PM

Initial schema draft at https://meta.wikimedia.org/wiki/Schema:TestSearchSatisfaction

For implementation the plan is as follows:

  • Generating a SERP on the backend creates a UUID and stores it in the users session. The searchEngineResultPage event is fired from the backend and the uuid is added to the page mw.config via resource loader.
  • Javascript in the SERP will use a click handler on the main anchors in the search results of the result page to create visitPageLinkedFromSERP events before allowing the link to trigger. Users with newer browsers will have navigator.sendBeacon functionality under which this will work "as expected", but for older browsers and internet explorer this wont work correctly. Ideally we could stuff these events that trigger when unloading the page into window.localStorage and have them fire from the next page, but IIRC that was previously proposed and is blocked on T66721.
  • My least favorite part, we need custom javascript running on all article pages to trigger the leavePageLinkedFromSERP event. The javascript needs to detect the search session uuid along with if we came to this page directly from the SERP. We can check document.referrer to see if we just came from the SERP, but we have to store the search session id somewhere. This cannot come from the backend user session, so we must store it with something in the browser. window.localStorage (actually jStorage, a wrapper library in core around localStorage) is the obvious choice and has a few levels of fallback so should work for most users. Will need to try and keep this as lightweight as possible. EventLogging and jStorage are already included on all article pages so we should only be adding a small amount of javascript and no extra libraries.

My main worry here is we seem to have the ability to lose both visitPageLinkedFromSERP and leavePageLinkedFromSERP for browsers that do not have navigator.sendBeacon functionality, which is likely quite common. I'm going to poke a few people and see what they think about allowing unload events to be pushed into jStorage now, it seems some progress has been made on T66721 since i last looked at it and this might now be possible. We would still have the ability to lose events for anyone that has a full localStorage though. Does anyone else have ideas?

Also need to figure out what our sampling ratio should be here, we receive around 150M searches a day but my guess is many of those are autocomplete searches and not the actual SERP. Do we have any information that can guide this decision?

Couple more concerns:

If we use localStorage/jStorage we have to start using client side timestamps. In my experience these only loosely correlate to server side timestamps with lots of variation. Basically it means we would want to generate all events client side (incuding the searchEngineResultPage event). Additionally we likely want to implement the jStorage stuff inside EventLogging directly, meaning coming up with a viable solution not just for our limited use case but for everyone. Due to the issues with timestamps I'm not sure anymore this is the right way to go.

sendBeacon, as mentioned above, is the preferred way to handle these unload events. An experiement was run from late nov 2014 through to the begining of jan 2015 to measure the reliability of sendBeacon. I've tried to run some stats against this data to see if it is reliable enough to collect our KPI, but it just seems odd to me. The test was to send an event with a unique logId via both regular logging and via sendBeacon. This collected (using a 1 in 10,0000 sample) 1.35M events via regular logging (logEvent) and 613k events via sendBeacon (logPersistentEvent). There are some odd things in the data though

 select event_method, count(distinct event_logId), 1 - (count(distinct event_logId) / count(*)) as percent_duplicate from SendBeaconReliability_10735916 group by event_method;
+--------------------+-----------------------------+-------------------+
| event_method       | count(distinct event_logId) | percent_duplicate |
+--------------------+-----------------------------+-------------------+
| logEvent           |                     1058034 |            0.2177 |
| logPersistentEvent |                      428941 |            0.3005 |
+--------------------+-----------------------------+-------------------+

Each event_logId should have been sent once for each event_method type, but instead there are lots of duplicates. Perhaps its normal, but I didn't realize that >20% of events recorded from javascript are duplicates. Breaking it down further:

select event_method, events_per_logId, count(*)
FROM (
select count(*) as events_per_logId, event_method
  from SendBeaconReliability_10735916
 group by event_logId
) x
group by event_method, events_per_logId

+--------------------+------------------+----------+
| event_method       | events_per_logId | count(*) |
+--------------------+------------------+----------+
| logEvent           |                1 |   570740 |
| logEvent           |                2 |   213224 |
| logEvent           |                3 |    19432 |
| logEvent           |                4 |    31519 |
| logEvent           |                5 |     7501 |
| logEvent           |                6 |    14264 |
| logEvent           |                7 |     4583 |
| logEvent           |                8 |     6976 |
| logEvent           |                9 |     2473 |
| logEvent           |               10 |     3067 |
| logEvent           |               11 |     1120 |
| logEvent           |               12 |     1203 |
| logEvent           |               13 |      411 |
| logEvent           |               14 |      409 |
| logEvent           |               15 |      142 |
| logEvent           |               16 |      115 |
| logEvent           |               17 |       57 |
| logEvent           |               18 |       37 |
| logEvent           |               19 |       12 |
| logEvent           |               20 |        8 |
| logEvent           |               21 |        3 |
| logEvent           |               22 |        1 |
| logEvent           |               23 |        2 |
| logEvent           |               30 |        1 |
| logEvent           |               34 |        1 |
| logEvent           |               37 |        1 |
| logEvent           |               40 |        1 |
| logEvent           |               46 |        1 |
| logEvent           |               91 |        1 |
| logEvent           |              205 |        1 |
| logEvent           |              386 |        1 |
| logPersistentEvent |                1 |     3541 |
| logPersistentEvent |                2 |   150001 |
| logPersistentEvent |                3 |     2741 |
| logPersistentEvent |                4 |    12771 |
| logPersistentEvent |                5 |     2118 |
| logPersistentEvent |                6 |     5512 |
| logPersistentEvent |                7 |     1449 |
| logPersistentEvent |                8 |     2748 |
| logPersistentEvent |                9 |      824 |
| logPersistentEvent |               10 |     1239 |
| logPersistentEvent |               11 |      384 |
| logPersistentEvent |               12 |      479 |
| logPersistentEvent |               13 |      149 |
| logPersistentEvent |               14 |      163 |
| logPersistentEvent |               15 |       58 |
| logPersistentEvent |               16 |       64 |
| logPersistentEvent |               17 |       15 |
| logPersistentEvent |               18 |       16 |
| logPersistentEvent |               19 |        5 |
| logPersistentEvent |               20 |        7 |
| logPersistentEvent |               22 |        3 |
| logPersistentEvent |               34 |        2 |
+--------------------+------------------+----------+
53 rows in set (39.39 sec)

But i feel i must be doing something wrong there, because that suggests almost all events collected via sendBeacon had at least one duplicate event (but the first query showed only 30% duplicates, so i'm a bit confused). We can work around this in our data by adding an equivalent to logId in our events and deduplicating at query time. If this is normal though should probably file a bug against EventLogging to deduplicate these itself on insert instead of at query time though.

Back to the original question i was trying to answer from this though, if we rely on events sent just before leaving the page (via click handlers for visitPageLinkedFromSERP and the unload event for leavePageLinkedFromSERP) will we get reliable, usable information. TBH i don't know, @Ironholds i would appreciate if you could offer some advice here :)

EBernhardson closed this task as Resolved.Jun 11 2015, 10:47 PM

I'm declaring this one complete, as a spike is figuring things out not actually doing it. The implementation of this is T100907

Ironholds reopened this task as Open.Jun 11 2015, 11:18 PM

Nope!

  1. Only Dan gets to close things, so he has a chance to review them and make sure that the output matches the input
  2. This isn't actually the complete task; the complete task is this schema, a big-ass document and some experimental testing.

thats not a spike then :P

Indeedy! Will modify the header

Ironholds renamed this task from [Spike, 2 days] Figure out some hypothetical formula for measuring the user perceived accuracy of full text search and create a plan to implement that including phabricator tasks to Figure out some hypothetical formula for measuring the user perceived accuracy of full text search and create a plan to implement that including phabricator tasks.Jun 11 2015, 11:31 PM

@Ironholds: To clarify, you're saying that the "create a plan" part of this issue has prerequisites of defining a schema, documenting (something), and running tests. And then the actual result (the "plan") would point to all of that?

Does all of that still fit within a roughly "2 days" scope as was originally specified?

(I just want to make sure that keeping this open makes sense, as opposed to having additional/separate tasks.)

Well, you'll note [spike, 2 days] is no longer in the header ;p

Yes. However, when a task dramatically changes in scope or nature, simply renaming it may or may not be sufficient. If it grows, the PO might lower its priority, or might want to split it such that the high value/low cost parts can be done sooner, and the low value/high cost parts can be done later.

If it changes in nature (which I don't think happened here), then the comments tend to become confusing, because they now refer to stuff that appears irrelevant, given the new title.

If this story is still going to take "about 2 days", then all is well. If it has grown substantially, we (you/me/Dan) should discuss it.

And either way, the description should become more clear about what "the plan" means.

It's grown substantially but I don't think we need a meeting. What we're talking about, here, is:

  1. Take what we know;
  2. Take what we're doing;
  3. Document both on Meta

That's a new task but I don't think it needs a full-fledged chat.

Deskana closed this task as Resolved.Jun 18 2015, 8:23 PM

This task, as originally scoped, is complete. Follow-up work to document and summarise the plan is in T101277.