Page MenuHomePhabricator

Investigate the full-text search pattern on mobile web
Closed, ResolvedPublic

Description

Normally the number of search has the same weekly pattern as the number of pageviews, varied by platform. For example, on desktop, the number of search and the number of pageviews are higher on weekdays and drop on weekends.

For mobile web, the pattern of pageviews are the opposite -- higher on the weekends, but we didn't see the same pattern for full-text search:

fulltext_mobile.png (1×1 px, 210 KB)

The query I used is P5973.

This is a spin-off of T174396#3589647.

Event Timeline

debt triaged this task as Medium priority.Sep 28 2017, 8:12 PM

We examined the query P5973 carefully and didn't find anything that would change the full-text search usage pattern on mobile web. More interestingly, when we focus on users who went through the "prefix -> full-text" funnel, we can see that while the number of users and the number of prefix search are higher on weekends, this same group of users open more full-text search result pages on weekdays:

Rplot.png (750×1 px, 110 KB)

Query:

ADD JAR hdfs:///wmf/refinery/current/artifacts/refinery-hive.jar;
CREATE TEMPORARY FUNCTION search_classify AS 'org.wikimedia.analytics.refinery.hive.GetSearchRequestTypeUDF';
USE wmf;
WITH users AS (
SELECT
  CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) AS date,
  client_ip, user_agent,
  SUM(IF(search_classify(uri_path, uri_query) = 'prefix', 1, 0)) AS n_prefix,
  SUM(IF(is_pageview AND referer_class = 'internal' AND page_id IS NULL AND uri_path = '/w/index.php' 
         AND (LENGTH(PARSE_URL(CONCAT('http://', uri_host, uri_path, uri_query), 'QUERY', 'search')) > 0
         OR LENGTH(PARSE_URL(CONCAT('http://', uri_host, uri_path, uri_query), 'QUERY', 'searchToken')) > 0), 
         1, 0)) AS n_fulltext
FROM webrequest
WHERE year = 2017 AND month = 9 AND day >= 1 AND day <= 15
  AND webrequest_source = 'text'
  AND access_method = 'mobile web'
  AND agent_type = 'user'
  AND http_status = '200'
GROUP BY CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')), client_ip, user_agent
-- Filter users(client_ip + user_agent) with prefix search
HAVING array_contains(collect_set(search_classify(uri_path, uri_query)), 'prefix')
)
SELECT 
  date,
  COUNT(*) AS n_users,
  SUM(IF(n_fulltext > 0 , 1, 0)) AS user_has_fulltext,
  SUM(n_prefix) AS n_prefix,
  SUM(n_fulltext) AS n_fulltext
FROM users
GROUP BY date;

Therefore, we think that since people use full-text instead of prefix search results when they have a in-depth information need, and this kind of need is often driven by work/school project, we see more full-text searches on weekdays than weekends on both desktop and mobile web.

As pointed out by @Tbayer, this reader behavior study helps us confirm our conclusion. It shows that Wikipedia readers' motivations triggered by the media and conversation are increased over the weekends, and those triggered by work/school are increased on week days. Also, users whose motivations are conversations are often on their mobile devices.

debt subscribed.

Great job, Chelsy!