Page MenuHomePhabricator

Search Metrics - Number of Searches
Closed, ResolvedPublic

Description

See parent task for context.

We would like to have an idea of how much traffic search is getting. In particular, we would like to be able to compare this traffic to the page view traffic.

This metric in particular need to be split by use cases (full text search, go bar, ...) and we need to be able to filter out bot traffic. We want to be able to compare this metric with the overall number of pageviews.

Details

TitleReferenceAuthorSource BranchDest Branch
Cirrus metrics calculationsrepos/search-platform/notebooks!4ebernhardsonsearch-metricsmain
Customize query in GitLab

Event Timeline

@EBernhardson should we close this as a duplicate and move "(full text search, go bar, ...)" as a dimension aspect in T358352: Search Metrics - Number of user sessions using search?

One surprising finding is that calls to related articles are equivalent to ~20% of the page views on mobile web. The expectation was that related articles should be lazy loaded when users reach the end of the page, and so this should only represent a fraction of the page views. This might require further investigation with the Web-Team-Backlog.

                       go_to_page  serp  autocomplete  related_articles  \
access_method is_user                                                     
mobile web    True          0.38% 0.33%     696072186            21.30%   
desktop       True          3.95% 1.21%    1986619422             0.04%   
mobile app    True          0.01% 0.01%     788790887             6.70%   

                       other_api  num_pageviews  
access_method is_user                            
mobile web    True         0.08%    18013667217  
desktop       True         2.44%     9662285131  
mobile app    True       184.72%      427781694

@EBernhardson should we close this as a duplicate and move "(full text search, go bar, ...)" as a dimension aspect in T358352: Search Metrics - Number of user sessions using search?

It looks like it's possible to actually combine all three of T358349, T358351, and T358352 into a single computation. I also realized while presenting this data yesterday that i could probably dump the intermediate daily's into a table and query them from superset instead of pandas, which is more shareable. With them all being reduced to a single dataset might as well, I've done that and kicked it off to put daily aggregated stats into ebernhardson.T358345 with both per-actor and overall numbers for the month of march. Although we might still need to iterate on the aggregations.

I additionally made some adjustments to the dimensions. We now group on normalized_host instead of uri_host. It's the same data but structured so we could see all-wiktionaries or all bengali wikis easily. Additionally i dropped the device_family dimension and replaced it with os_family. I'm hoping this will be a more useful breakdown.

This chart should (eventually) contain the same data as gehel posted above. As of this moment only 5 days are calculated but the aggregate % have already settled in. I only spent a couple minutes to make the chart, this probably isn't the best way to present the data. But an example: https://superset.wikimedia.org/explore/?slice_id=3368

Four tickets were combined into a single ticket, two calculations, and found in the patch above:

  • T358349 - number of searches
  • T358350 - successfull searches
  • T358351 - read traffic generated by search
  • T358352 - number of user sessions using search