mobile-safari has very few internally-referred pageviews
Closed, ResolvedPublic

Description

Below you see the ratio of internally referred pages to all other pages. Roughly speaking the ratio of internal to all other pages is directly proportional to session length. 1:5, means there were 5 entries to wikipedia and only one of those people/sessions had a search or clicked on a link. 5:1 on the other hand would mean that 1 visit had 5 clicks or searches.

Using pivot.wmflabs.org, which is a GUI analysis tool for our weblogs, I see below is that the "session length" is much lower on mobile safari than on mobile chrome.

Mobile Chrome:

Mobile Safari:

When you look at weekend v. week and compare pageviews on mobile chrome and mobile safari we see that safari seems to have much lower pageviews during the week than chrome, but roughly equal on the weekends. This might correspond to the increased use of wikipedia for work/school on the weekdays, which leads to longer sessions. It also looks like this difference is fairly new.

This suggests that either:

  • safari sessions are dramatically shorter, OR
  • there is an error in how we identify referrals on Safari, OR
  • there is a bug in pivot.wikimedia.org
JKatzWMF created this task.Oct 20 2016, 7:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 20 2016, 7:17 PM
Nuria added a subscriber: Nuria.Oct 24 2016, 3:39 PM

@JKatzWMF : if we were to guess is your #2 suggestion. Pivot is just showing dat afrom pageview_hourly thus the tool doesn't aggreggate data any differently

Nuria moved this task from Incoming to Q3 (January 2017) on the Analytics board.Oct 24 2016, 3:41 PM
ovasileva added a subscriber: ovasileva.
Nuria moved this task from Q3 (January 2017) to To Task on the Analytics board.Dec 12 2016, 5:04 PM
Nuria assigned this task to mforns.Dec 15 2016, 5:11 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.Dec 19 2016, 3:15 PM

I'm looking into this.
I think the difference between Chrome Mobile and Mobile Safari in the third chart is legit. If we look at long term trends, Mobile Safari is decreasing and Chrome Mobile is increasing, so I guess this is the continuation of that trend. See chart (colors are inverted):


Now, the next chart proves that something is happening with the proportion between internal and external referrers on the Mobile Safari since Feb 22:

In yellow, we see the Mobile Safari page views referred from internal links for roughly the last year. At some point around Feb 22, there's an abrupt change that dropped the line to a much lower scale.
Will continue looking...

This issue regarding webrequest processing happened at the same exact time. Doesn't seem related with the Mobile Safari thing though, but listing it here for reference.
https://phabricator.wikimedia.org/T128295

Here are some observations and conclusions:

  • The problem doesn't just affect Mobile Safari, a lot of browsers and operating systems are affected, including Chrome Mobile (in a smaller extent). See the referer_class breakdown around Feb 22 for Firefox Mobile: http://tinyurl.com/h2cy44v. There's a clear drop of internal-class and a correspondent raise of none-class referrers. See the same behaviour in Opera Mobile: http://tinyurl.com/j3cpgza.
  • All the affected OSs and browsers see the drop in internal referers at the same exact time: Feb 22, 2016. This suggests that the issue is not related to a browser/OS-specific version update or bug. And points rather to an internal parsing problem at some point in the 'firehose' pipeline.
  • The issue seems localizable to mobile browsers. I couldn't find any examples of desktop browsers affected by the internal drop. The desktop site did indeed see a slight drop (http://tinyurl.com/zo8bedf), but I'm sure that it was caused by the mobile browsers that access the desktop site.
  • I also couldn't find any other split that narrows the issue to a smaller population. It affects all continents, all wikis, etc. However, the proportion is very different depending on the browser: Mobile Safari seems the most affected, together with Chrome Mobile IOS. Other browsers affected in a less obvious way include: Chrome Mobile, Firefox Mobile, Opera Mobile. Curiously enough, IE Mobile is not affected by the issue. Note that it is not an IOS-only problem, you can see how Android OS is affected: http://tinyurl.com/zfgvq5o.
  • There was a CDH upgrade to 5.5 on Feb 22, 2016. However, I don't think this is the cause of the issue.
  • I thought migration to HTTPS could be a potential reason, but could not find any related event matching Feb 22, 2016.
  • There were no other related events, like refinery changes or mediawiki deployments that I could find in gerrit, SAL or Phabricator.
  • The code that populates the referer_class field is here: http://tinyurl.com/jaxbtfj. And it only returns NONE when the referer header equals '-'. This suggests the problem is not a parsing problem, but rather that the referer header comes corrupted to the cluster. Sadly, I could not check that, because we do not have the raw webrequests for around Feb 22.
mforns added a subscriber: BBlack.Dec 20 2016, 9:17 PM

We found a potential cause of the issue:
copying @BBlack

On Feb 22, 2016 the WMF enabled the html referrer meta tag via this change[1] in wmf-config repository. The meta tag was set to "origin-when-cross-origin", which means, that Wikimedia wikis will populate the referer header with the full referrer URL only when the destination belongs to the same domain, meaning clicks to internal links[2].

The referrer meta tag was added to the browsers around year 2014 (depending on browser), but there was a typo in the specification: "origin-when-crossorigin" (without - between cross and origin). This option was implemented following the typo'd spec in all browsers. In mid 2015, the spec was corrected and newer browsers were adapted to support the correct option "origin-when-cross-origin" (with -).

So, when the WMF added the referrer meta tag with the "origin-when-cross-origin" option, the following happened:

  • Browsers older than 2014, ignored the tag because they didn't implement it. And the referrer continued to be sent normally.
  • Browsers (2014-2015) implementing the typo'd spec of the referrer meta tag, would fail to recognize the option and default to 'None: Never pass referral data'.
  • Newer browsers (2015+) implement the corrected spec and would work properly and set the referrer header.

Until today, there are certain browser versions that are not populating the referrer header:

  • Chrome and Chrome Mobile v34 -> v42
  • Firefox and Firefox Mobile v38 -> v40
  • Safari and Mobile Safari v8+ (still has not implemented the correction, see[3])
  • Opera and Opera Mobile v21 -> v29

As a conclusion:

  • This issue does NOT indicate that Safari sessions are dramatically shorter than other sessions.
  • The NONE referrers, will decrease with time because older browsers are used less and less.
  • Except for Safari and Mobile Safari, that still need to be corrected.

[1] https://gerrit.wikimedia.org/r/#/c/255408/2/wmf-config/InitialiseSettings.php
[2] https://moz.com/blog/meta-referrer-tag (see: 4.Origin When Cross-Origin)
[3] https://bugs.webkit.org/show_bug.cgi?id=154588

mforns moved this task from In Progress to Done on the Analytics-Kanban board.Dec 20 2016, 9:17 PM

Thank you for solving this mystery, @mforns. Is there anything we can do on our end to remedy this?

mforns added a comment.EditedDec 21 2016, 2:04 PM

@JKatzWMF

Mmmm, the only thing I can think of is changing to the typo'd version of "origin-when-crossorigin" (without -). In my short understanding of the subject, this wording is still supported by all browsers, but I may be wrong. In any case, this would look like an ugly patch to fix a problem belonging to Safari, and it's likely that @BBlack does not approve it?

Restricted Application added a project: Operations. · View Herald TranscriptDec 21 2016, 4:23 PM

@JKatzWMF Besides documenting this fact as one on the dataset (super thanks for reporting!) I do not think there is anything else for us to do, browser updates will fade away the problem as this bug will eventually be resolved. assigning to traffic team just in case they can think of a mitigation strategy.

Nuria removed mforns as the assignee of this task.Dec 22 2016, 12:09 AM
Nuria edited projects, added Analytics; removed Analytics-Kanban.
Nuria added a subscriber: mforns.
ema moved this task from Triage to General on the Traffic board.Dec 22 2016, 3:00 PM

@Nuria @mforns I think having an alternative with the typo'd version makes a lot of sense. These metrics are used as a proxy for session depth and for our evaluation of traffic drivers to Wikipedia (source/external = % of visits) to name a few. It has already been 10 months of poisoned metrics and I don't like the idea of waiting until the browsers slowly and unpredictably change this. Aside from the time, it makes it impossible to know when we can trust the numbers again and for us to isolate the problem timeframe.

mforns assigned this task to BBlack.Dec 22 2016, 6:36 PM

Assigned the task to @BBlack , so that he can give his opinion on this.

Nuria added a comment.Dec 22 2016, 7:30 PM

@JKatzWMF Do ping @BBlack about the impact of the change in your metrics, on our end there are no code changes needed to process header either way so the code change just needs to happen on mediawiki end: https://gerrit.wikimedia.org/r/#/c/255408/2/wmf-config/InitialiseSettings.php

Nuria moved this task from To Task to Radar on the Analytics board.Jan 5 2017, 7:44 PM
BBlack added a comment.Jan 5 2017, 7:45 PM

It's not really my feature, I just happened to write the very short config patch to turn it on, because nobody else had at the time. For the history on this, see also:
T149858 , espectially from T87276#2055761 onwards, where some issues with Safari's support of both spellings was raised. Also the original discussion in https://meta.wikimedia.org/wiki/Research_talk:Wikimedia_referrer_policy , and the original code change for it here: https://gerrit.wikimedia.org/r/#/c/186104/2 . I'm not opposed to any particular path, but I'd be careful that someone fully research all the implications of any change to the previous misspelled variant, and possibly look at using the header instead of the meta tag.

Nuria added a comment.EditedJan 5 2017, 8:19 PM

@JKatzWMF
Looks like we already discussed on whether to support the missspelled version (ahem... one of them) and consensus was that we will go with the official value: origin-when-cross-origin. Discussion can be read here: https://phabricator.wikimedia.org/T87276

As you found out this doesn't work in safari and a bug was filed on this regard: https://bugs.webkit.org/show_bug.cgi?id=154588
Now, the misspelled version - or one of them- doesn't work in many other browsers and throws warnings on console.

Edit:
Safari (counting mobile) makes up for about 20% of our pageviews.

Either way there is no perfect solution but we rather not revisit a decision already taken and thus far it looks like no modifications will be done to the meta referrer tag by traffic team per @BBlack comment above. I shall be closing this ticket.

Nuria closed this task as "Resolved".Jan 5 2017, 8:20 PM
JKatzWMF added a comment.EditedJan 5 2017, 9:04 PM

Either way there is no perfect solution but we rather not revisit a decision already taken and thus far it looks like no modifications will be done to the meta referrer tag by traffic team per @BBlack comment above. I shall be closing this ticket.

@Nuria in light of the fact that it is miscategorizing one of our key traffic metrics by as much as 20% of all traffic, I think there is ample reason to revisit a decision that was made without knowledge of this impact. @BBlack did not say he was against making a modificiation, just that we should research the implications.The bug you refer to was filed for webkit almost a year ago and has not been resolved. While in February the decision to wait made sense, I think it is clear that it is not getting resolved anytime soon. I will file a separate ticket to explore remedying this.

Nuria added a comment.Jan 5 2017, 9:15 PM

@JKatzWMF: sounds good, as I said on our end there are no changes needed to process the header either way. I just closed ticket cause it did not seem like any action was going to be taken. If you want to own changing the meta tag (and possibly adding 2, plus testing issues with errors triggered by misspelling) I am all for it.