Page MenuHomePhabricator

Investigate mobile search dashboard data
Closed, ResolvedPublic

Description

Dashboard is currently down for maintenance so apologies for not having screenshots:

It is almost definitive from looking at the mobile search events on the discovery dashboard (http://discovery.wmflabs.org/metrics/#mobile_events) that the data is inaccurate in some important way.

If you look at the desktop searches, they follow a common weekly pattern that match what we see in pageviews (drops on the weekends):

If you look at mobile web pageviews you see bumps on the weekends:

But the mobile web search dashboard does not show this pattern. This is very strange. We have also recently found a discrepancy between search event logging and pageview API counts, which suggest that at least one of our methods has an issue.

Event Timeline

JKatzWMF created this task.Aug 28 2017, 8:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 28 2017, 8:56 PM
debt triaged this task as Normal priority.Aug 29 2017, 8:17 PM
chelsyx added a subscriber: mpopov.

Write-up: https://people.wikimedia.org/~chelsyx/reports/T174396.html

We did not find any error on the dashboard side. In fact, we can see the weekend bumps on dashboard from time to time (e.g. Jan - Feb 2017). That being said, there are several things we can do to improve this dashboard:

  • Besides counting the number of events, we can count the number of user session tokens, which identifies users’ full interaction with the search field.
  • We can filter out high volume user sessions

Currently mobile web team only tracks prefix search with event logging, so full-text search events may be a source of the difference in pattern. But our full-text search result pageviews from webrequest didn't confirm this hypothesis (see the write-up). The pattern of it is the same as desktop search -- low in the weekends. The hive query I used to pull webrequest data is P5973 and P5974. @mpopov and I couldn't find any problems in these queries. @Tbayer, can you take a look at them when you have a moment? Thank you!

@chelsyx and I talked a bit about this today and she gave me some additional explanations; I will try to check the queries next week.

chelsyx added a subscriber: phuedx.

Update: Got email from @phuedx and I realize that I may have some misunderstanding about how Schema:MobileWebSearch work. I will have a meeting with an RW engineer to go through the implementation of the MobileWebSearch instrumentation to probe for any other issues before proceeding.

@chelsyx and I talked a bit about this today and she gave me some additional explanations; I will try to check the queries next week.

(for the record, we decided afterwards that this was secondary to investigating the aspects mentioned in T174396#3598749 ; I'll still be happy to give the Hive side a look later if needed)

On September 21st, 2017, we have a meeting with @phuedx and discuss some issues related to mobile web search. The link to the etherpad is https://etherpad.wikimedia.org/p/MobileWebSearch_Sync.

Based on our discussion and a follow-up email from @phuedx, we believe that Schema:MobileWebSearch doesn't have a sampling problem like T167236, so our mobile search pattern was not affected by a sampling issue.

Therefore, as mentioned in T174396#3589647, we did not find any error on the dashboard side (see this write-up for more details). For better interpretation, we will add a new graph on the dashboard which counts the number of user session tokens and filter sessions by volume (see T176811).

We also notice that our data has a clearer pattern before March 29th, so it’s possible that our pattern issue may be related to the drop in the number of search events on March 29th. See T176464 for investigation of this issue.

There is also a problem about the full-text search pattern on mobile web (as mentioned in T174396#3589647). To keep this ticket clean, we spin it off to T176815.

debt closed this task as Resolved.Oct 2 2017, 2:19 PM
debt added a subscriber: debt.

Thanks for the investigation and writeup, @chelsyx