Page MenuHomePhabricator

Investigate nonCompliant errors for Navigation Timing schema (5% is ignored)
Closed, ResolvedPublic

Description

Per T112593 and 5b5621c3199d we now measure the rejections from the isCompliant() check.

https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?var-schema=NavigationTiming

It turns out about 5-10% of Navigation Timing events are ignored client-side (~ 20 out of 300 every minute).

Here's a 1 minute sample of User-Agent values from the Kafka stream for statsv with metric_name eventlogging.client_errors.NavigationTiming.nonCompliant.

"user_agent":"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
"user_agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows Phone OS 7.5; Trident/5.0; IEMobile/9.0; NOKIA; Lumia 800)"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_0_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13A404 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Mobile/13A452 [FBAN/FBIOS***]"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 9_0_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Mobile/13A405 YJApp-IOS jp.co.yahoo.ipn.appli/4.3.1"
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 9_0_2 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"
"user_agent":"Mozilla/5.0 (Linux; U; Android 4.4.4; en-us; Lenovo A6000 Build/KTU84P) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 UCBrowser/9.7.5.418 U3/0.8.0 Mobile Safari/533.1"
"user_agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"

Mostly:

  • IE 10 (Windows 7; NT 6.1)
  • IE 9 (Windows 7; NT 6.1)
  • Mobile Safari 9 (iOS 9)

The majority was from Mobile Safari from iOS 9, which was released last month and (re-)introduced Navigation Timing support. Perhaps their implementation wasn't standards compliant?

The presence of IE9 and IE10 values is also noteworthy. We should look into our traffic stats and see how this total of 5-10% relates to their total traffic. Perhaps this is all IE9/10/iOS 9 clients? Or just some? Note that Navigation Timing is sampled. In WMF production it is sampled 1:1000.

Event Timeline

Krinkle created this task.Oct 27 2015, 4:40 AM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added a project: Performance-Team.
Krinkle added subscribers: Peter, Krinkle, Aklapper, ori.
Peter added a comment.Oct 27 2015, 9:33 AM

About the iOS9, see that we have one entry that points out itself as the UC browser. When I tried last week, I saw that UC by default used the User Agent as Safari (at least in the settings), I didn't check though what actually got sent. Also remember that the UC browser did identify itself implementing some features but then just didn't.

I'll setup a test page later just to check. Else the way forward would be to propagate what values are failing so we can see that in the log?

About the iOS9, see that we have one entry that points out itself as the UC browser. [..]

The entry for UCBrowser appears to be from Android, not iOS.

I'll setup a test page later just to check. Else the way forward would be to propagate what values are failing so we can see that in the log?

Cool. I'm curious whether the non-compliance is deterministic for those browsers or whether it's an anomalous edge case.

If we can't reproduce it, we can add instrumentation to log some of the non-compliant values somewhere for investigation.

ori assigned this task to Krinkle.Nov 9 2015, 7:51 PM
ori triaged this task as Normal priority.
ori set Security to None.
ori moved this task from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Peter added a comment.EditedNov 17 2015, 6:20 AM

uc

Yep you are right, sorry. In the UC app you choose user agent as iphone though, whatever that means. UC seems to have many many variants of user agents.

I've tried with an jsbin just to try our isComplient method on an iphone with latest ios and it works fine.
https://jsbin.com/rareredila/edit?js,console

BUT when I open it in gmail on my phone, in a webview I get nonCompliant:

The problem is that somehow responseEnd is smaller that responseStart, so its a bug in the browser:

But I guess it could be more of these issues. Lets see if we can find a bug report and if I can get it to happen with just pure Safari.

Peter added a comment.Nov 17 2015, 9:47 AM

Tried to try it other webviews using Safari to see if we could match the user agent in the logs. Facebook don't let us paste jsbins but linkedin do:

So it seems like using browsers in app fire the loadEventEnd but has some wrongs in the other metrics. However it doesn't fail every time, seen it working in some of my tests.

Krinkle updated the task description. (Show Details)May 13 2016, 1:54 PM
Krinkle closed this task as Resolved.May 23 2016, 8:23 PM

Closing for now. We did the investigation and it seems primarily be caused by WebView on iOS. Hopefully it'll be resolved in the next iOS update, but either way we there's nothing we can do about from our side. We're already discarding the invalid data correctly. It's just unfortunate that it's that much.