I think this is genuinely hard - there is research to do and we need to adapt whatever best practices we can find to what works for us.
Based on our conversation today, it sounds like we should consider a "per-language" rating. For example, it sounds like our rating for "any search for Chinese characters without spaces" probably sucks.
Dan, Wes and Oliver met today to discuss this. Oliver presented a metric that we can use to evaluate whether searches are successful or not.
- Oliver (@Ironholds) makes a presentation summarising the discussion we had and the metric he presented, and link that here
- Draft a schema to try to measure search success using this metric
- Take this schema to the engineers and get them to implement it
Very short notes from talking to oliver about what we want to do here:
did they click on a thing, track next page view. Did they go back (return to SERP).
so, track search for a thing:
- clicked on a thing
- went to article A, then B, then C (how far to track?)
- were they on page long enough to get something of value
in ideal universe, capture search hits:
- click through to pages
- clicks out of site
- goal: time on page linked from search result page
- needs a uuid for search session
need to create a schema for this
- search session uuid
- event type (search page, result page, click out)
Random extra ideas i think we might want to consider (maybe in later iterations of this, i dunno):
- track the number of search results that were provided
- track which page of search results the user is clicking from
- track the position (i.e. 4th link in page) of the search link that was clicked
Initial schema draft at https://meta.wikimedia.org/wiki/Schema:TestSearchSatisfaction
For implementation the plan is as follows:
- Generating a SERP on the backend creates a UUID and stores it in the users session. The searchEngineResultPage event is fired from the backend and the uuid is added to the page mw.config via resource loader.
My main worry here is we seem to have the ability to lose both visitPageLinkedFromSERP and leavePageLinkedFromSERP for browsers that do not have navigator.sendBeacon functionality, which is likely quite common. I'm going to poke a few people and see what they think about allowing unload events to be pushed into jStorage now, it seems some progress has been made on T66721 since i last looked at it and this might now be possible. We would still have the ability to lose events for anyone that has a full localStorage though. Does anyone else have ideas?
Also need to figure out what our sampling ratio should be here, we receive around 150M searches a day but my guess is many of those are autocomplete searches and not the actual SERP. Do we have any information that can guide this decision?
Couple more concerns:
If we use localStorage/jStorage we have to start using client side timestamps. In my experience these only loosely correlate to server side timestamps with lots of variation. Basically it means we would want to generate all events client side (incuding the searchEngineResultPage event). Additionally we likely want to implement the jStorage stuff inside EventLogging directly, meaning coming up with a viable solution not just for our limited use case but for everyone. Due to the issues with timestamps I'm not sure anymore this is the right way to go.
sendBeacon, as mentioned above, is the preferred way to handle these unload events. An experiement was run from late nov 2014 through to the begining of jan 2015 to measure the reliability of sendBeacon. I've tried to run some stats against this data to see if it is reliable enough to collect our KPI, but it just seems odd to me. The test was to send an event with a unique logId via both regular logging and via sendBeacon. This collected (using a 1 in 10,0000 sample) 1.35M events via regular logging (logEvent) and 613k events via sendBeacon (logPersistentEvent). There are some odd things in the data though
select event_method, count(distinct event_logId), 1 - (count(distinct event_logId) / count(*)) as percent_duplicate from SendBeaconReliability_10735916 group by event_method; +--------------------+-----------------------------+-------------------+ | event_method | count(distinct event_logId) | percent_duplicate | +--------------------+-----------------------------+-------------------+ | logEvent | 1058034 | 0.2177 | | logPersistentEvent | 428941 | 0.3005 | +--------------------+-----------------------------+-------------------+
select event_method, events_per_logId, count(*) FROM ( select count(*) as events_per_logId, event_method from SendBeaconReliability_10735916 group by event_logId ) x group by event_method, events_per_logId +--------------------+------------------+----------+ | event_method | events_per_logId | count(*) | +--------------------+------------------+----------+ | logEvent | 1 | 570740 | | logEvent | 2 | 213224 | | logEvent | 3 | 19432 | | logEvent | 4 | 31519 | | logEvent | 5 | 7501 | | logEvent | 6 | 14264 | | logEvent | 7 | 4583 | | logEvent | 8 | 6976 | | logEvent | 9 | 2473 | | logEvent | 10 | 3067 | | logEvent | 11 | 1120 | | logEvent | 12 | 1203 | | logEvent | 13 | 411 | | logEvent | 14 | 409 | | logEvent | 15 | 142 | | logEvent | 16 | 115 | | logEvent | 17 | 57 | | logEvent | 18 | 37 | | logEvent | 19 | 12 | | logEvent | 20 | 8 | | logEvent | 21 | 3 | | logEvent | 22 | 1 | | logEvent | 23 | 2 | | logEvent | 30 | 1 | | logEvent | 34 | 1 | | logEvent | 37 | 1 | | logEvent | 40 | 1 | | logEvent | 46 | 1 | | logEvent | 91 | 1 | | logEvent | 205 | 1 | | logEvent | 386 | 1 | | logPersistentEvent | 1 | 3541 | | logPersistentEvent | 2 | 150001 | | logPersistentEvent | 3 | 2741 | | logPersistentEvent | 4 | 12771 | | logPersistentEvent | 5 | 2118 | | logPersistentEvent | 6 | 5512 | | logPersistentEvent | 7 | 1449 | | logPersistentEvent | 8 | 2748 | | logPersistentEvent | 9 | 824 | | logPersistentEvent | 10 | 1239 | | logPersistentEvent | 11 | 384 | | logPersistentEvent | 12 | 479 | | logPersistentEvent | 13 | 149 | | logPersistentEvent | 14 | 163 | | logPersistentEvent | 15 | 58 | | logPersistentEvent | 16 | 64 | | logPersistentEvent | 17 | 15 | | logPersistentEvent | 18 | 16 | | logPersistentEvent | 19 | 5 | | logPersistentEvent | 20 | 7 | | logPersistentEvent | 22 | 3 | | logPersistentEvent | 34 | 2 | +--------------------+------------------+----------+ 53 rows in set (39.39 sec)
But i feel i must be doing something wrong there, because that suggests almost all events collected via sendBeacon had at least one duplicate event (but the first query showed only 30% duplicates, so i'm a bit confused). We can work around this in our data by adding an equivalent to logId in our events and deduplicating at query time. If this is normal though should probably file a bug against EventLogging to deduplicate these itself on insert instead of at query time though.
Back to the original question i was trying to answer from this though, if we rely on events sent just before leaving the page (via click handlers for visitPageLinkedFromSERP and the unload event for leavePageLinkedFromSERP) will we get reliable, usable information. TBH i don't know, @Ironholds i would appreciate if you could offer some advice here :)
- Only Dan gets to close things, so he has a chance to review them and make sure that the output matches the input
- This isn't actually the complete task; the complete task is this schema, a big-ass document and some experimental testing.
@Ironholds: To clarify, you're saying that the "create a plan" part of this issue has prerequisites of defining a schema, documenting (something), and running tests. And then the actual result (the "plan") would point to all of that?
Does all of that still fit within a roughly "2 days" scope as was originally specified?
(I just want to make sure that keeping this open makes sense, as opposed to having additional/separate tasks.)
Yes. However, when a task dramatically changes in scope or nature, simply renaming it may or may not be sufficient. If it grows, the PO might lower its priority, or might want to split it such that the high value/low cost parts can be done sooner, and the low value/high cost parts can be done later.
If it changes in nature (which I don't think happened here), then the comments tend to become confusing, because they now refer to stuff that appears irrelevant, given the new title.
If this story is still going to take "about 2 days", then all is well. If it has grown substantially, we (you/me/Dan) should discuss it.
And either way, the description should become more clear about what "the plan" means.
It's grown substantially but I don't think we need a meeting. What we're talking about, here, is:
- Take what we know;
- Take what we're doing;
- Document both on Meta
That's a new task but I don't think it needs a full-fledged chat.