Page MenuHomePhabricator

Investigate mobile/desktop disparity on sister search numbers
Closed, ResolvedPublic

Description

There seems to be something funny or at least counter-intuitive going on with the initial data from sister search. See:

http://discovery-beta.wmflabs.org/metrics/#sister_search_traffic

From the desktop numbers, it seems like desktop search has 10x the 'results' page opens as mobile search. This makes sense and, as @debt mentioned, Sister search is below the fold on mobile, however, the traffic being referred from mobile is almost double. On other projects, I have found
that when intuition and the data disagree, it is often because there is something glitchy going on.

Can you take a look?

Event Timeline

@JKatzWMF would @Tbayer be able to take a look at the Hive query that is generating the dataset and confirm it is correctly counting sister search-referred pageviews by platform, wiki, etc.? Just in case Chelsy or I missed some particular detail when writing/reviewing it. The query is at https://github.com/wikimedia/wikimedia-discovery-golden/blob/master/modules/metrics/search/sister_search_traffic

On the dashboard side, @chelsyx can confirm there's no funky stuff going on (e.g. no accidental label switching) when reading the data in or visualizing it:

Yes, on the dashboard side, I didn't see anything that could possibly flip desktop and mobile around.

"Sister search is below the fold on mobile" - is that true in all languages? On the French Wikipedia, I'm actually seeing a huge box with sister project search links right on top:

Search screenshot fr.m.wikipedia.org 20170711.jpg (2×1 px, 494 KB)

@Tbayer French is special in that they have a separate box, not related to WMF"s "sister search" (which, if you scroll down on FR) remains on the bottom. It shows up on desktop too.

@JKatzWMF Yes, saw that, but even though it was not created by WMF, it might generate traffic too ;) To a SERP on the sister project though, not a content page. I happened to look at frWP last night because it seemed to be overrepresented (50 out of 100) when I idly checked a sample output of the sister_search_pvs subquery from the code @mpopov had linked, restricted to SERPS (is_serp = true).[1] I haven't grokked / vetted the whole thing though, that would indeed require setting aside more time.

[1]

ADD JAR hdfs:///wmf/refinery/current/artifacts/refinery-hive.jar;
CREATE TEMPORARY FUNCTION normalize_host AS 'org.wikimedia.analytics.refinery.hive.GetHostPropertiesUDF';
SELECT
    '2017-07-01' AS date, access_method,
    CASE normalized_host.project
         WHEN 'commons' THEN 'wikimedia commons'
         WHEN 'simple' THEN CONCAT('simple ', normalized_host.project_class)
         WHEN 'species' THEN 'wikispecies'
         ELSE normalized_host.project_class
    END AS project,
    IF(normalized_host.project IN('commons', 'meta', 'simple', 'incubator', 'species'), '',
       IF(normalized_host.project = 'en', 'English', 'Other languages')) AS language,
    uri_host,
    referer
  FROM wmf.webrequest
  WHERE
    webrequest_source = 'text'
    AND CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) = '2017-07-01'
    AND is_pageview
    AND referer_class = 'internal'
    AND (
      INSTR(referer, '/w/index.php?search=') > 0
      OR INSTR(referer, '/wiki/Special:Search?search=') > 0
    )
    -- warning: comparing uri_host = PARSE_URL(referer, 'HOST') would mark 'en.m.wikipedia.org' as a sister of 'en.wikipedia.org'
    AND normalize_host(PARSE_URL(referer, 'HOST')).project_class = 'wikipedia'
    AND normalize_host(PARSE_URL(referer, 'HOST')).project_class != normalized_host.project_class
    AND NOT normalized_host.project_class IN('mediawiki', 'wikimediafoundation', 'wikidata')
    AND NOT normalized_host.project IN('meta', 'incubator')
    -- keep commons.wikimedia.org and species.wikimedia.org:
    AND NOT (normalized_host.project_class = 'wikimedia' AND NOT (normalized_host.project IN('commons', 'species')))
    -- flag for pageviews that are search results pages (e.g. if user clicked to see more results from a sister project):
    AND (
      page_id IS NULL
      AND (
        uri_path = '/wiki/Special:Search'
        OR (
          uri_path = '/w/index.php'
          AND (
            uri_query RLIKE '^\?search\='
            OR INSTR(uri_query, '?title=Special:Search&search=') > 0
          )
        )
      )
    ) -- = is_serp 
  LIMIT 100;

Back to the article clickthroughs: It's strange that both the desktop and mobile referral traffic decrease on weekends in the dashboard. General traffic (pageviews) declines on desktop but rises on mobile every weekend (as mentioned earlier at T167850).

Traffic to sister projects from Wikipedia SERPS - dashboard screenshot from 2017-07-11.png (469×816 px, 57 KB)

Change 365190 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/golden@master] Fix sister search traffic query

https://gerrit.wikimedia.org/r/365190

Change 365190 merged by Chelsyx:
[wikimedia/discovery/golden@master] Fix sister search traffic query

https://gerrit.wikimedia.org/r/365190

Update: French and Catalan were the only languages that use a community-developed sister search sidebar in addition to ours. I've separated out those two languages into their category but that wasn't it:

Screen Shot 2017-07-14 at 12.00.23 PM.png (571×1 px, 167 KB)

Weird.

I've looked at this from the event logging side to see if we're accidentally logging clicks that come from the community developed sidebars, but given that the CSS selectors we use to trigger the events are different than those used in the other sidebars (and testing this myself to be sure), this doesn't seem to be the case either.

Just my 2 cents... This doesn't seem counter intuitive to me or funny...

In mobile sister search links are very hard to distinguish from real results. The heading is easily missed and the pagination links appear underneath them. To me at least they look like search results (maybe even featured search results). I wouldn't be surprised if people (myself included) are mistaking them from search results and clicking them. Thus this data doesn't really surprise me... Note on desktop, they appear on the right aside from the results so the workflow here is completely different, same way the sidebar is invisible to most users on Vector.

Screen Shot 2017-07-17 at 12.29.02 PM.png (543×481 px, 103 KB)

We've checked everything that we're logging and collecting for the stats on the sister project snippets. There isn't anything that is 'off' or needs to be fixed in the logging or display of the results.

debt claimed this task.