Page MenuHomePhabricator

Review Navigation Timing metrics
Closed, DuplicatePublic

Description

T148018: Use fetchStart instead of navigationStart for Navigation Timing to measure the actual page performance touched on this before, but wrongly assumed that we used navigationStart at the moment. We don't for all browsers, we use startLoadTime for Chrome.

The investigation in T100999: Make the logo's loading priority higher reveals that there is truth to the fact that fetchStart should be taken into account.

Moreover, the use of startLoadTime doesn't always work, which probably explains some odd values we were getting for firstPaint before (huge negative integers), as demonstrated here: https://jsfiddle.net/9fufvLyt/1/ It might be due to the iframe, but it's noteworthy. Direct link to destination: https://dl.dropboxusercontent.com/u/109867/test/index.html

We need to review our calculation of firstPaint in Chrome, as it might be wrong and lump things into that metric it shouldn't. It might mean a breaking change in regards to our historical data, but so far it seems like Chrome was inconsistent with other browsers, which means that our existing data is already polluted when aggregating multiple browsers.

Event Timeline

Something else that's peculiar: if you open the link in a new tab (eg. middle click) from a different origin or no origin, fetchStart is "high". If you open in a new tab from the same origin, it's low, but not as low as a direct click. It seems like fetchStart captures some overhead coming from opening a new tab.

@Peter regarding your comment regarding redirects on the other tasks, I don't think that we do redirects for any major feature. Even our article redirects seem to be using JS to rewrite the URL and in no-JS they just display the page without rewriting the URL, no HTTP redirect.

As far as I can tell unload doesn't affect fetchStart on same-origin. It's always 0 no matter what I do, even when doing something really stupid during onbeforeunload.

Speaking of 0 fetchStart, I've noticed that in NavigationTiming we don't report 0 values for any metric. Which in the case of fetchStart on Edge is the overwhelming majority when I test things. This probably explains the very low report rate for fetchStart in Edge compared to its report rate for firstPaint... (about 25% of firstPaint reports).

On Chrome, however, fetchStart is reported about half as much as firstPaint. However, I noticed that the report rate for both drops significantly around April 27th. While fetchStart could disappear easily if it had 0 more often, 0 for firstPaint is highly unlikely.

Maybe this is the explanation for the fetchStart/firstPaint drop. Somehow we're getting less NavTiming reports from Chrome than before, and the missing reports are mostly slow ones?

Looking at more metrics, we seem to have lost around 10 reports per minute for Chrome on most metrics, except sslNegotiation, redirecting, connecting and dnsLookup (the last 2 are always super low, though).

This phenomenon coincides with when the Chrome 58 rollout really ramped up (according to the reports to NavTiming); Which might suggest that NavTiming reports metrics less often on Chrome 58.

I need to look at our traffic stats between Chrome 57 and Chrome 58 to figure out if the ratio between the 2 version for NavTiming reports is the same.

I've checked in Pivot and on May 2nd the combination of 57 and 58 is exactly the same amount of pageviews as 57 alone was on April 19th, before 58 started rolling out. Which means that 57+58 combined should report the same amount of NavTiming records.

However, if Chrome 58 was under-reporting NavTiming (and not 57), it shouldn't overtake 57 at the same time as it does for pageview. And apparently, it does. Which would suggest that both 57 and 58 started under-reporting to NavTiming. It's probably poor timing that the 58 rollout ramped up just when we released a lot of changes to production.

I've been staring at Pivot for a while and I've noticed something interesting. The report rate for fetchStart seems to follow the pageview trend for Chrome. And right now we're experiencing a dip in traffic similar to the christmas period, and that pattern is found in both datasets. What's most striking is the huge spike we had over last summer on Chrome specifically, which is found in both:

pivot-vs-navtiming.JPG (1×2 px, 198 KB)

I explored further in the data and that summer spike seems to have come from India and Russia? Quite odd.

Anyway, that following the general traffic for fetchStart explains the dip in report rates. Which itself may or may not explain the drop in fetchStart values (it might be unfortunate timing as well). Speaking of a drop, there is one discrepancy in the NavTiming data: there's a big drop in fetchStart median on February 13th, recovering on February 16th, that doesn't come with a dip in Chrome traffic/report rate. I think we'll have an answer when the traffic recovers to its usual levels.

Now, something weird in NavTiming data is that while the fetchStart metric follows Chrome traffic quite closely with the summer spike, that summer spike is completely absent from firstPaint, which otherwise follows the traffic trend. Chrome that reports fetchStart and not firstPaint? That's odd to say the least, especially in the millions of pageviews. Maybe the summer thing was a bot?

I think we need to figure out if we're going to change firstPaint calculation in Chrome to be based on fetchStart. Unfortunately we only have redirecting data, not unloading data. It would have been nice to know if the fetchStart drop was due to unloading going down. But now it's too late to collect that.

Based on my POC, I think it's safe to calculate firstPaint for Chrome based on fetchStart instead of startLoadTime. And while we're at it, we can start tracking unloading time. This way if by any chance unloading time increases and a decrease of it was responsible for the fetchStart/firstPaint drop, we'll see it next time something happens.

I seem to be having confirmation of what I've seen in practice here:

https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/Q3fYD9VZIdo

Basically, the spec is misleading and in practice the time difference between navigationStart and fetchStart is the time it takes for the browser to get ready, particularly when you open a new tab. It's out of our control, and it's subject to change when browser/OS/popular browser extension upgrades happen. I think it's good to keep an eye on it, the way we are, but it shouldn't be lumped into firstPaint, which should be about our own content.

Now, if this is true for Chrome, it's true for others as well, unfortunately since fetchStart isn't reported when it's 0, I'm not sure that it's safe to combine the existing metrics in Grafana to reconstruct the "proper" firstPaint. A breaking change in regards to our historical data seems like the way to go. Furthermore, what should we do about the other NavTiming metrics, like domInteractive? Should these also start from fetchStart? Should we stop tracking things like loadEventStart/loadEventEnd or responseStart/responseEnd separately and track their difference instead?

@Gilles we do a 302 if you search and we make a "go lucky hit", for example search for Obama and hit return. All our internal search do a 302 right?

window.performance.timing.fetchStart - window.performance.timing.navigationStart = 164 (or whatever you get)

But I agree it is better to measure that in other ways and make our firstPaint only dependent on actual response and render.

@Krinkle you have thought a lot about tracking diffs instead of start/end, what's your take?

Chrome 57 shipped with Navigation Timing v2: https://www.chromestatus.com/features/5409443269312512 probably not related but it was the only thing I found.

Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.

Reading the Navigation TIming v2 spec, I think the argument to use fetchStart as the origin is confirmed by the breakdown of what happens in case of a redirect. When a redirect happens, after recording the redirect duration, it goes back to updating fetchStart for the new page it's been redirected to and start again from there ("step 9").

If we want to study the performance of pages that can be redirected to, but are not necessarily redirected to, lumping the redirect time into the big metrics like firstPaint makes it impossible to study performance of that page properly. A portion of pageview with redirects will inflate all the metrics like firstPaint, and we won't be able to tell if something is wrong with the page itself. If suddenly we get more redirects organically or through a mistake in code, all metrics look like they're worsening, when really only the redirect or fetchStart metrics should be getting worse. Furthermore, the v2 spec which is clearer, does suggest that external redirects and unloads, which we have no control over, should impact fetchStart.

Recording redirects and redirect time is necessary (and we already do that), separately. Same for unload, which is like redirects, is about the previous page. Both can reveal problems, but problems the current page (and firstPaint, dom complete, etc.) isn't responsible for.

To summarize, I think we need to:

  • track unload and redirect on their own, as a diff (we already track redirect that way)
  • track all metrics that happen after fetchStart with fetchStart as the origin

I think that diffs instead of start/ends is a separate debate, but if we're going to make a breaking change to our NavTiming data, we might want to make that decision now and change it all at once. IMHO it would be cleaner to track as diffs what can be, because they have logical separations (app cache, dns, tcp connecting, ssl, requesting, receiving the response, processing the DOM), and only have a handful of metrics that are about the total user waiting time compared to fetchStart (firstPaint, mybe domComplete and domInteractive).

The user can't sense the DNS request as its own thing, but he/she can experience the effect of firstPaint quite clearly.

I'm not sure that domInteractive and domComplete can be felt by the user? Those have a guaranteed order and could also be expressed as a diff, I think we need to discuss it.

My opinion is that the metrics that can be expressed as diffs (most of them, really) are good technical indicators for us of something that might go wrong. Do we have an issue with DNS? With the app cache? With DOM parsing? And that the only metrics that span a lot of things, like firstPaint, are only worth tracking if they reveal something the user feels clearly. When the time comes we'll want to replace those with better ones, like first meaningful paint, etc.

The issues we've encountered here are:

  • our diff metrics aren't exhaustive and we miss unload to verify if it's responsible for the fetchStart drop
  • unload, redirect and fetchStart are tricky because they're 0 a lot of the time and we don't send 0 values. If for example we got rid of a needless internal redirect, suddenly we'll get more 0s and the impact won't be seen on the main graph, but only on the report rate graph. This is something we'll have to remember.
  • firstPaint, domComplete, domInteractive include the unload and redirect times, which are about previous pages or even external pages we don't control. It creates noise and we end up investigating what might be "fake" performance improvement and regressions

My biggest regret here is that we didn't track unload. It would have let us know if this "fake" performance improvement was internal or external.

Gilles renamed this task from Review firstPaint calculation for Chrome to Review Navigation Timing metrics.May 9 2017, 10:44 AM

Some more food for thought.

Because we don't record 0 values, there's no good way to combine the existing firstPaint, domComplete, domInteractive, etc. with the fetchStart metric in Grafana. Because the medians are the medians of expressed values, ignoring zeros, which is not the same as the real median if the 0 were reported and the fetchStart report rate matched firstPaint et al.

However, I believe that we have the EventLogging records for NavigationTiming, which will be missing the fetchStart field if it was 0. Meaning that we could in theory reconstruct the history of firstPaint and others as if they used fetchStart as their time origin, by re-processing EventLogging data.

And the same is probably true for metrics where we tracked start and end separately and we're considering merging them into a single diff metric.

If might be a big undertaking, @Krinkle might now more about how feasible that is, but if we change the NavTiming metrics and like the new version, it's nice to know that we might have a way to reconstruct historical data.

The question is whether it's worth all that effort. The fiscal year is ending, with its annual plan performance goals. We haven't set such goals for the next annual plan. I'm going to write a blog post about performance changes over the past year. For the next update like this, we can use the point where we changed metrics as the new reference. So I'm not sure it's that important to fix historical data.

When we do this, I think it will be a good opportunity to also collect requests and how many that is cached, let me do a small test that we can discuss tomorrow.

You mean when the page itself is a cache hit? Not worrying about the resources on the page? I'm not sure that even that is possible without some usual guess work about times that represent a cache hit vs a network call:

In case where the user agent already has the domain information in cache, domainLookupStart and domainLookupEnd represent the times when the user agent starts and ends the domain data retrieval from the cache.

requestStart has the same property:

This attribute must return the time immediately before the user agent starts requesting the current document from the server, or from relevant application caches or from local resources.

That's all from the v2 spec which is more explicit in its definitions, I assume that the same is true for the v1 implementation.

The problem with all of that is that there might be some overlap between a cache hit on a very slow computer with a spinning disk and a cache miss on very fast computer with a very fast internet connection close to our DCs.

No I mean cache hit for resources but not sure how we should handle that with the ResourceLoader. First I though of simple solution just using Nav Timing V2 and Resource Timing V2 but that will not do see we will miss requests never being fired. What I'm really is after is getting numbers for making our synthetic testing more "real world", meaning how much of the page is cached.

We could have RL report how much it found in local storage and treat it the same as cache hits in resource timing. But we won't get the gzipped size in that case. I'm not sure what you want to count, though, request count or actual size.