Page MenuHomePhabricator

Understanding search around Hindi video campaign
Closed, ResolvedPublic

Description

We'd like to understand what happens when new readers come to the Hindi Wikipedia in connection with the April 2018 video campaign (T190730), and how/if they're able to browse further through the wiki.
Research questions:
...
[before & during] Do people coming in through the campaign search? Do they find content?

We'd need to know the campaign parameter for when we're sifting through web request logs and search logs, then we can look for:

  • Search volume from new readers
  • Their searches' zero results rate
  • Their engagement with the results (via clickthrough rate)

Before the campaign, we can use the Search Metrics dashboard to get a baseline for these metrics (except the search volume one, which we can query manually because we still need to add that to the dashboard) that we can compare against once the campaign launches.

Screen Shot 2018-04-03 at 2.13.47 PM.png (568×1 px, 217 KB)

Screen Shot 2018-04-03 at 2.13.58 PM.png (568×1 px, 201 KB)

Event Timeline

March 2018 search volume and ZRR by platform for baseline:

hiwiki-search.png (600×1 px, 98 KB)

ADD JAR hdfs:///wmf/refinery/current/artifacts/refinery-hive.jar;
CREATE TEMPORARY FUNCTION array_sum AS 'org.wikimedia.analytics.refinery.hive.ArraySumUDF';
CREATE TEMPORARY FUNCTION is_spider as 'org.wikimedia.analytics.refinery.hive.IsSpiderUDF';

SELECT
  year, month, day,
  IF(source = 'web', 'desktop or mobile web', 'mobile app') AS platform,
  array_sum(requests.hitstotal, -1) = 0 AS zero_result,
  COUNT(1) AS searches
FROM cirrussearchrequestset
WHERE wikiid = 'hiwiki'
  AND year = 2018 AND month = 3
  AND (
    (source = 'api' AND useragent RLIKE '^WikipediaApp')
    OR source = 'web'
  )
  AND requests[SIZE(requests)-1].querytype IN('comp_suggest', 'full_text', 'GeoData_spatial_search', 'prefix', 'more_like', 'regex')
  AND NOT ARRAY_CONTAINS(requests.hitstotal, -1)
  AND NOT is_spider(useragent)
GROUP BY
  year, month, day,
  IF(source = 'web', 'desktop or mobile web', 'mobile app'),
  array_sum(requests.hitstotal, -1) = 0;
library(tidyverse)

x <- read_csv("~/Downloads/query-hive-2595.csv") %>%
  mutate(date = as.Date(paste(year, month, day, sep = "-"), "%Y-%m-%d")) %>%
  spread(zero_result, searches) %>%
  group_by(date, platform) %>%
  summarize(
    `Total Searches` = True + False,
    `Zero Results Rate` = True / (True + False)
  ) %>%
  ungroup %>%
  gather(metric, value, -c(date, platform))

ggplot(x, aes(x = date, y = value, color = platform)) +
  geom_line() +
  geom_smooth(method = "gam", se = FALSE, formula = y ~ s(x, k = 7), linetype = "dashed") +
  scale_color_brewer(palette = "Set1") +
  facet_wrap(~ metric, scales = "free_y") +
  wmf::theme_facet(14, "Source Sans Pro") +
  labs(
    x = "Date", color = "Platform", y = NULL,
    title = "Searches on Hindi Wikipedia",
    subtitle = "From March 2018 Cirrus search logs"
  )

@atgo uhhhhhhh…possibly? I thought T191132 is to look at differences in search as a result of design changes leading up to the campaign, but if there's a misunderstanding then yeah.

Ok, yeah those are different, but both important :)

One additional note, @mpopov is that we're targeting the campaign to Hindi speakers in Madhya Pradesh

In April 2018, there are 26,039 pageviews (10429 from Facebook, 15609 from YouTube, 1 from Twitter) of the Hindi Wikipedia main page directed by the video campaign (with the campaign parameter in there url).

pv_video_campaign.png (1×3 px, 283 KB)

There were only 77 searches (17 from Facebook, 60 from Youtube) from these pageviews, which is only 0.3% of the pageviews. Please note that we are able to detect searches from the campaign only when users search on the main page. If users who came from the campaign navigate to other pages then search, we won't be able to know these searches are from the campaign.

Out of these 77 searches, there are 48 searches (62.34%) yield non-zero result, which means users actually get some results from our search engine.

Unfortunately, we are not able to tell how many clickthroughs we have out of these searches.

Since some people who watch the video will likely instead google or enter the URL (hi.wikipedia.org) to get to the site, we also get the numbers for all searches and compare them before and during the campaign.

We don't see any spike or dip in the total number of search, zero result rate and clickthrough rate, when the Hindi mobile main page change (March 26th), or when the video campaign launch (April 2nd):

overall_search.png (1×3 px, 442 KB)

Screen Shot 2018-04-03 at 2.13.47 PM.png (568×1 px, 217 KB)

Screen Shot 2018-05-02 at 3.56.33 PM.png (603×1 px, 219 KB)

In sum, we don't see any impact on search from mobile main page change or video campaign on Hindi Wikipedia.

@atgo Please let me know if you have any question.

Mark this task as resolved. Feel free to reopen it if you have any question.