Page MenuHomePhabricator

Review current search metrics for accuracy and documentation
Closed, ResolvedPublic

Description

This task encompasses investigation into:

  • Whether search metrics we are tracking are getting the correct data.
  • Whether we are getting the proper data from Desktop and Mobile Web, and Android and iOS Apps

Issues and anomalies

These are the issues and anomalies with our dashboard that's been found so far. Some can be fixed immediately. For those that need further investigation, we will create new tickets.

Infrastructure

  • All instances in the shiny-r project needs to upgrade as soon as possible (T204688, T204505)
  • This one depends on T168967

Bugs in dashboard

  • On SRP visit time, the results for “How long Wikipedia searchers stay on the search result pages” looks broken because the results for the three groups (English, French and Catalan, and Other) are all identical
  • Dwell time on visited page stop updating since 2017-8-27
  • On metrics summary, adding explanation of user engagement calculation could be helpful. Same on KPI: User engagement
  • On KPI: User engagement, make sure the calculation is correct. Also adding user engagement graph by platform may be helpful, since CTR on various platforms seem very different
  • On desktop events, adding CTR graph may be helpful
  • On PaulScore Approximations, paulscore for autocomplete searches is 0
  • On mobile app events, Android events count drop since 4/20, iOS events count drop since 6/19. Their load time also increase around the same time. These should relate to app eventlogging schema changes and need to be fixed and documented
  • Invoke Source on Mobile App stop updating since 2017-10-4
  • On API calls by referrer class, need to add the annotation about UDF change in 2017-06-29

Anomalies in metrics (may need further investigation)

Event Timeline

Vvjjkkii renamed this task from Review current search metrics for accuracy and documentation to j3aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from j3aaaaaaaa to Review current search metrics for accuracy and documentation.Jul 2 2018, 1:52 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
chelsyx triaged this task as Medium priority.
chelsyx moved this task from Backlog to Doing on the Product-Analytics board.

Change 462032 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Fix bugs in survival analysis

https://gerrit.wikimedia.org/r/462032

Change 462606 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Fix paulScore for autocomplete searches

https://gerrit.wikimedia.org/r/462606

Change 462606 merged by Bearloga:
[wikimedia/discovery/golden@master] Fix paulScore for autocomplete searches

https://gerrit.wikimedia.org/r/462606

For the anomalies, if there isn't time (yet, or at all) to investigate them individually, would it be possible to create a sampling tool that would let us look for obvious skews in the usual usage stats?

I'm imagining a tool where you specify a day, and probably a wiki, and possibly other parameters to narrow down the scope, and you get back a frequency list of the top 100 pages visited, a histogram of page visit times, histograms of search session length and number of queries plus a list of the top 100 most extreme, a count of bot vs non-bot searches, the top 100 IPs issuing queries, the top 100 user agents, the top 100 queries, the top 100 referrers, etc.

Comparing two or three days during "normal" times right before an anomaly, and two or three days during the anomaly could reveal likely culprits. The page of a celebrity who recently died, and related pages, suddenly got a lot of traffic. The proportion of identified bots doubled. One IP address issued ten times as many queries as the next two or three combined. A single search session lasted the entire 24 hours and had 200,000 queries. Traffic from Reddit spiked. A specific query, or a bunch of related queries, jumped to the top of the list—the latter indicating that people are interested in a topic, or the former indicating that people may be following a link.

These kinds of stats don't give specific answers, but they do point to external causes. So at least we'd know that something happened during a given anomaly, even if we didn't have all the details.

This tool (or at least part of it) would have to be internal-only since IP addresses, user agents, and queries are all potential PII. It might also be fairly expensive to run, so we might not want to make it widely available.

OTOH, if building such a tool is more work than investigating 100 anomalies, then it probably isn't worth it in the short term.

Does anyone else think this would help, or should we just let mysteries be mysteries?

Change 462032 merged by Chelsyx:
[wikimedia/discovery/golden@master] Fix bugs in survival analysis

https://gerrit.wikimedia.org/r/462032

Change 463517 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] Add chelsyx to analytics-search-users group

https://gerrit.wikimedia.org/r/463517

Change 463543 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Change SQL queries using MobileWikiAppSearch table to Hive queries

https://gerrit.wikimedia.org/r/463543

Change 463575 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Add magrittr package in sample_page_visit_ld.R

https://gerrit.wikimedia.org/r/463575

Change 463517 merged by Ottomata:
[operations/puppet@production] Add chelsyx to analytics-search-users group

https://gerrit.wikimedia.org/r/463517

Change 463543 merged by Chelsyx:
[wikimedia/discovery/golden@master] Change SQL queries using MobileWikiAppSearch table to Hive queries

https://gerrit.wikimedia.org/r/463543

Change 463575 merged by Chelsyx:
[wikimedia/discovery/golden@master] Add magrittr package in sample_page_visit_ld.R

https://gerrit.wikimedia.org/r/463575

Change 467866 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Refactor load_times.R to avoid using beeline to query

https://gerrit.wikimedia.org/r/467866

Change 467866 merged by Chelsyx:
[wikimedia/discovery/golden@master] Refactor load_times.R to avoid using beeline to query

https://gerrit.wikimedia.org/r/467866

Change 468089 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Search dashboard audit

https://gerrit.wikimedia.org/r/468089

Change 468089 merged by Chelsyx:
[wikimedia/discovery/rainbow@develop] Search dashboard audit

https://gerrit.wikimedia.org/r/468089

All the changes have been merged. Please check https://discovery.wmflabs.org/metrics/ and let me know if there is any questions.


@TJones Thanks for the suggestions! The tool you suggested sounds helpful, but as you mentioned, it may be fairly expensive to run and contains PII, which require some authentication tool. Druid, Superset/Turnilo may be the solution for this kind of tool, but I'm still learning them and not sure if they can be helpful to our problem.

Meanwhile, some of the dashboards we already have can be helpful to unveil the cause of some anomalies. For example, we saw direct usage of full-text search via API increase since 3/22, and we also saw the fulltext ZRR including bots increase around the same time (it didn't increase if we excluded bots on the dashboard). This suggest that the spike we saw in full-text search via API is likely due to bot behavior.

Additionally, most of the mysteries I've seen so far are the result of internal cause -- bugs in our data retrieval script, changes in analytic engineering team's parsing script (e.g. the weird pattern on morelike search and prefix search since Apr 1st seems to be the result of that direct referred traffic are re-categorized as internal traffic), or changes made by the front-end team (mobile web, ios/android app). If we can find out ways to build a better communication channel with those teams that use our search services, and let them notify us when related changes occur and we keep a log of these changes, I think that would be more helpful in understanding the mysteries.

Sounds good, @chelsyx —thanks for the updates and all the fixes!

I think we can close this ticket. The general consensus seems to be that the "anomalies" are mostly not errors and are just unexplained variation in usage patterns, which we don't necessarily need to track down. (There's one that still bothers me, though.)

I've found one more concern on the dashboard pages, so I will open a new ticket for the one outstanding anomaly/bug and the other new possible bug.

Thanks so much for all the hard work, @chelsyx!