Refactor Navigation Timing gathering to produce reliable stackable measures (aka "frontend.navtiming2")
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	Jul 6 2015, 7:34 PM

Description

Most of the required parts for this are already in place but @Peter and I found it somewhat untrustworthy.

Some metrics are missing, some are transformed to relative offsets between two points, some are aggregated, some zero values are ignored in questionable ways. It's hard to get a complete picture.

From T104902#1626842 with @Peter:

Move to deltas and "start" values for our metrics.

Have the server fragment our frontend properties by browser_family using the ua-parser library server-side. We can wildcard over this for general graphs, but it allows us to draw a more realistic picture for individual browsers (and help pinpoint problems with specific browsers), since median of everything is not very representative of real users.

Don't ignore non-compliant browsers that report properties out of order. Even quite recently Firefox and Chrome have both had bugs with compliancy. (e.g. https://bugzilla.mozilla.org/1123920). So if the browser is non-compliant, we'd emit some kind light-weight error increment beacon. Perhaps with statsv. This will help us keep track of how much data we're excluding.

In T99060#1299940, @ori wrote:

An idea from GitHub's engineering blog: use a stacked graph to represent navigation timing metric.

http://githubengineering.com/browser-monitoring-for-github-com/

Being able to render a graph like this would be quite valuable as well.

d9892654-ff05-11e4-8d62-340091dada79.png (793×1 px, 308 KB)

And to have the dashboard feature sub-views for individual browsers because the overall picture is somewhat skewed as it is based on medians of individual properties. It does not represent the experience of any individual user (per se).

High-level plan

From T104902#3253227, and T104902#3746563:

Server will not filter out zero values.
Server will not exclude values based on an upper bound. – T104902#3746563
Server will apply remaining filters to entire events, not individual values. If one of the values is invalid, discard the event as a whole.
Server will record values to Graphite as relative to fetchStart, instead of navigationStart.
Server will introduce new metrics for deltas, to make stacking more accurate. (e.g. "dns", "request", "response", etc. in addition to dnsStart/End, requestStart/End, etc.)
(Last step) Clean up client-side code to only send what's needed for navtiming2.

frontend.navtiming2 issues

Stacked values don't match the total from loadEventEnd – T178479
"dns" is usually zero? – T178479#3699302
"unload" correctness – T178479
"redirecting" correctness – T178479
Make sure mediaWikiLoadEnd or mediaWikiLoadComplete is working - T180598

Details

Subject	Repo	Branch	Lines +/-
Remove creation of frontend.navtiming metrics in favour of navtiming2	performance/navtiming	master	+3 -341
Collect Navigation Timing gaps	performance/navtiming	master	+13 -2
webperf: Collect Navigation Timing gaps	operations/puppet	production	+16 -4
navtiming: Remove broken 'rendering' and 'pageSpeed' metrics	operations/puppet	production	+0 -2
Collect Navigation Timing gaps	mediawiki/extensions/NavigationTiming	master	+8 -2
webperf: Skip upper limits for values in navtiming2	operations/puppet	production	+86 -3
webperf: Fixed typeo for navtiming2 responseEnd property	operations/puppet	production	+632 -4
navtiming.py: Make values stackable	operations/puppet	production	+6 K -479
Add unload delta to make reliable stackable metrics.	mediawiki/extensions/NavigationTiming	wmf/1.30.0-wmf.17	+6 -5
Add unload delta to make reliable stackable metrics.	mediawiki/extensions/NavigationTiming	master	+6 -5
webperf: Add unit tests for schema handlers and stat dispatching	operations/puppet	production	+577 -18
webperf: Allow zero values for navtiming metrics	operations/puppet	production	+21 -11
coal: log all Navigation Timing metrics, not just four	operations/puppet	production	+17 -5

Related Objects
Search...

Status	Assigned	Task
Declined	None	T99060 Create a dashboard of key user-centric metrics
Resolved	Peter	T104902 Refactor Navigation Timing gathering to produce reliable stackable measures (aka "frontend.navtiming2")
Resolved	Peter	T109756 Collect all Navigation Timing metrics
Resolved	Krinkle	T112593 Measure non-compliant browsers with Navigation Timing
Resolved	Krinkle	T112592 EventLogging client should log errors (e.g url too long, or validation failure)
Resolved	Krinkle	T112594 Separate frontend.navtiming metrics by browser family
Resolved	• Gilles	T116197 Investigate missing values for responseStart/End and loadEventStart/End
Resolved	Krinkle	T116713 Investigate nonCompliant errors for Navigation Timing schema (5% is ignored)
Duplicate	Peter	T178479 Investigate navtiming2 values
Resolved	Krinkle	T179555 Investigate 60% drop in Navigation Timing report rate
Resolved	Peter	T180598 NavigationTimingv2 misses mediaWikiLoadComplete

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

gerritbot added a project: Patch-For-Review.Aug 18 2017, 10:02 PM

Peter claimed this task.Aug 21 2017, 7:10 AM

Peter moved this task from To-do: Goals prioritized current Quarter to Doing (old) on the Performance-Team board.

Synced with Timo last Friday and I will start with this now. A couple of things when I started to look into it:

(Final optimisation) Compute deltas client-side instead of server-side to make schema and event packet smaller.

Not sure about that, isn't better to send the full Navigation Timings to the backend and then do all the handling there? Thinking we want the (correct) deltas but we will still want some metrics compared with fetch start and not deltas, so we can get the median/p* value for that? We can build a dashboard using the deltas but to be able to act on changes as we do in the alerts we want median/p* for domInteractive/domLoaded /fullyLoaded etc?

Server will report absolute values relative to fetchStart instead of navigationStart.

Today we do that in in the Navigation Timing extension, so to make everything run in parallel we need to send everything twice for a while or first move to calculate everything in the backend as a first step (sending absolute values). @Krinkle what do you prefer or is it a better way to do it?

I started to change the backend and what about keep naming the metrics following https://www.w3.org/TR/navigation-timing/#process -
we would have deltas for:

redirect
appCache
dns
tcp
request
response
processing (do we want to split this one into moar?)
onLoad

and that means we need to send more data serverside but that is kind of good I think.

In the backend code we only have a test case for the user agent, would be good to have a test case that at least go through the code path I'm changing, I'll look else I pair up with @Gilles when he is back.

Server will report absolute values relative to fetchStart instead of navigationStart.

ah it was simpler than I thought; we send fetchStart today so lets used use that. let me commit tomorrow and you can have a first look.

In T104902#3537223, @Peter wrote:

In the backend code we only have a test case for the user agent, would be good to have a test case that at least go through the code path I'm changing [..]

Aye, my thoughts exactly :) I had actually already started on this the day before:

In T104902#3534486, @gerritbot wrote:

[operations/puppet@production] webperf: Add unit tests for schema handlers and stat dispatching
https://gerrit.wikimedia.org/r/372577

I actually learned a lot about Python in writing this commit. While it appears to work, I'm certainly very open to feedback on whether it makes sense (still not quite sure when to use which data type), and whether there might be any performance issue with this approach due to the introduction of iterators in the production code.

Yes you are always one step ahead :) I've most things ready by wait until the tests lands, so I can add my own make sure I don't break anything. Thanks for the help @Krinkle

Change 372577 merged by Ottomata:
[operations/puppet@production] webperf: Add unit tests for schema handlers and stat dispatching

https://gerrit.wikimedia.org/r/372577

Change 375345 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[operations/puppet@production] Make values stackable

https://gerrit.wikimedia.org/r/375345

Change 376059 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[mediawiki/extensions/NavigationTiming@master] Add more metrics to produce reliable stackable measures.

https://gerrit.wikimedia.org/r/376059

Change 377454 had a related patch set uploaded (by Krinkle; owner: Phedenskog):
[mediawiki/extensions/NavigationTiming@wmf/1.30.0-wmf.17] Add unload delta to make reliable stackable metrics.

https://gerrit.wikimedia.org/r/377454

Change 376059 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Add unload delta to make reliable stackable metrics.

https://gerrit.wikimedia.org/r/376059

Change 377454 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@wmf/1.30.0-wmf.17] Add unload delta to make reliable stackable metrics.

https://gerrit.wikimedia.org/r/377454

Mentioned in SAL (#wikimedia-operations) [2017-09-14T16:59:42Z] <krinkle@tin> Synchronized php-1.30.0-wmf.17/extensions/NavigationTiming: T104902 (duration: 01m 00s)

ReleaseTaggerBot added a project: MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)).Sep 14 2017, 5:01 PM

Peter mentioned this in T176105: Navigation Timing is broken.Sep 18 2017, 5:44 AM

Change 375345 merged by Elukey:
[operations/puppet@production] navtiming.py: Make values stackable

https://gerrit.wikimedia.org/r/375345

Krinkle removed projects: MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)), Patch-For-Review.Oct 5 2017, 6:21 PM

Krinkle updated the task description. (Show Details)

Started to create the new stackable version but we are missing processing and response time. We got that in the example data but in the output the test case so there must be bug there.

Change 382643 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[operations/puppet@production] Fixed typeo for responseEnd property.

https://gerrit.wikimedia.org/r/382643

gerritbot added a project: Patch-For-Review.Oct 6 2017, 4:43 AM

Change 382643 merged by Herron:
[operations/puppet@production] webperf: Fixed typeo for navtiming2 responseEnd property

https://gerrit.wikimedia.org/r/382643

@Peter I've updated the old stacked graph to merge "domInteractive" (responseEnd>domInteractive) and "domComplete" (domInteractive>domComplete) into one metric "processing" (responseEnd>domComplete) to match your new one.

Also played around with some colour changes:

Previous	Candidate 1	Candidate 2 (current)

	(No repeats of the same color, less contrasting colors)	Rainbow?

Not sure what's best or if it matter :)

Also added an extra overlay line: "domInteractive", in addition to "total" and "firstPaint". This makes up for the fact that "processing" was merged into one big stack so that we can still see at which point document-ready happens ($.ready/readyState=interactive/domContentLoaded). And I've made the colors match those of the Chrome DevTools (paint=green, dom-ready=blue, load/total=red).

Krinkle mentioned this in T178479: Investigate navtiming2 values.Oct 23 2017, 8:49 PM

From todays meeting: redirecting and unload is reported incorrect, we need to report the zeros from the client too to make medians etc correct.

Krinkle mentioned this in T179555: Investigate 60% drop in Navigation Timing report rate.Nov 2 2017, 7:45 AM

Krinkle added a subtask: T179619: Consider aligning Chrome firstPaint with navStart (or fetchStart).Nov 2 2017, 9:12 PM

Krinkle added subtasks: T178479: Investigate navtiming2 values, T179555: Investigate 60% drop in Navigation Timing report rate.

@Peter I added navtiming2 as alternative source on https://grafana.wikimedia.org/dashboard/db/navigation-timing. With that, I noticed the mediaWikiLoadComplete metric is missing for navtiming2.

I believe this is because the navtiming2 code is looking for mediaWikiLoadStart/End instead of mediaWikiLoadComplete, which does not exist anymore (the property exists in Graphite due to a delta-trial we did in 2015, but not used now). The coded mediaWikiLoad case in the python code is not triggered.

For continuity, I think it'd be nice to keep reporting mediaWikiLoadComplete as one of the absolute metrics (like responseStart, firstPaint and loadEventEnd).

Later, as part of T160315, we can then re-introduce the mediaWikiLoadEnd metric in the way you proposed (from fetchStart). But that will need a change in the client side first to report it. Which we may want to save for after finishing navtiming2 as otherwise the beacon gets larger again. If we first finish navtiming2 and remove the unused fields, we'll clear up some of the 2K quota for the beacon.

Speaking of the quota, I think we're artificially limiting ourselves by using EventLogging, which relies on the URL only to transmit data. SendBeacon supports a payload in addition to the URL. From what I can gather Chrome is the only one that limits its size, at 64k. It seems like the 64k quota is shared by the different sendBeacon calls for a given pageload. That's still plenty of legroom compared to the current limit.

We could either improve EventLogging to make it look in the payload if the URL is empty, and have the client-side code of EventLogging switch to the payload as soon as the data is greater than 2k. Or we could have a separate intake mechanism for navtiming2, that doesn't rely on EventLogging.

@Gilles The limitation is not EventLogging but Varnish. EventLogging consumes and produces via Kafka. Right now, a central piece of that stack is that Varnish has no backend server for /beacon/event. Instead, it blindly responds with HTTP 204 No Content, and drops the request.

In a nutshell: Similar to the varnishrls and varnishmedia deamons, there is varnishkafka to collect webrequest stats. The shared memory buffer from Varnish doesn't hold on to entire POST bodies, only the meta data (including request urls).

Making it handle POST bodies would require creating a new backend service to handle each incoming Varnish beacon request in real-time and store it in Kafka.

Worth considering if we're really stuck with that problem. These requests aren't cached and with a new service like what you've described, I expect that the bottleneck (if any) will still be kafka. I get that piggy-backing on the shared memory buffer allowed us to avoid writing a new service, but I don't expect said service to be that different from the existing daemons, which are already services with uptime of their own.

Krinkle removed a project: Patch-For-Review.Nov 8 2017, 6:07 PM

Krinkle updated the task description. (Show Details)

@Peter In https://gerrit.wikimedia.org/r/#/c/390061/4 I realised that our is_sane for navtiming (v1) is actually not being applied to the continent/country breakdown.

In addition, we already know (and was a conscious decision) that the coal data also doesn't discard any values based on any threshold.

I'm thinking if it makes sense to just remove the upper bound entirely? I'm having trouble finding any reason for it to stay. We various data-compliance filters already, and we aggregate into medians and percentiles. That already cuts out a lot. Drawing an upper bound on what we include seems counter-intuitive. But I'm open to hearing reasons :-) If we do decide to remove the upper bound, navtiming2 might be a good place to starting doing it so that we have a clean cut in the future (if/when we switch).

Krinkle updated the task description. (Show Details)Nov 13 2017, 9:46 PM

Krinkle merged a task: T178479: Investigate navtiming2 values.

Krinkle updated the task description. (Show Details)Nov 13 2017, 9:48 PM

@Krinkle Yep totally agree let us skip that.

Change 391496 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[operations/puppet@production] Skip upper limits for values in navtiming2

https://gerrit.wikimedia.org/r/391496

gerritbot added a project: Patch-For-Review.Nov 15 2017, 5:04 AM

Make sure mediaWikiLoadEnd or mediaWikiLoadComplete is working

Yep it is a miss match in navtiming2 what is sent from the user (mediaWikiLoadComplete) but the code expect mediaWikiLoadEnd and mediaWikiLoadStart (and then calculate the newly named mediaWikiLoad). Let me fix that.

Peter updated the task description. (Show Details)Nov 15 2017, 1:13 PM

Change 391496 merged by Dzahn:
[operations/puppet@production] webperf: Skip upper limits for values in navtiming2

https://gerrit.wikimedia.org/r/391496

Krinkle updated the task description. (Show Details)Nov 16 2017, 4:42 AM

Krinkle added a subtask: T180598: NavigationTimingv2 misses mediaWikiLoadComplete.Nov 17 2017, 2:53 AM

Peter moved this task from Doing (old) to To-do: Goals prioritized current Quarter on the Performance-Team board.Nov 20 2017, 9:23 PM

Krinkle closed subtask T180598: NavigationTimingv2 misses mediaWikiLoadComplete as Resolved.Nov 27 2017, 6:34 PM

Krinkle updated the task description. (Show Details)

Krinkle closed subtask T179555: Investigate 60% drop in Navigation Timing report rate as Resolved.Jan 9 2018, 7:01 PM

TODO: Add misc to navtiming2 for gaps between metrics such as connectEnd-responseStart.

Krinkle removed a project: Patch-For-Review.Jan 18 2018, 10:45 AM

Krinkle removed a subtask: T179619: Consider aligning Chrome firstPaint with navStart (or fetchStart).Jan 18 2018, 2:34 PM

Krinkle added a project: NavigationTiming.Feb 2 2018, 3:02 AM

Krinkle mentioned this in T186295: Slim down Navigation Timing schema.Feb 2 2018, 3:15 AM

Krinkle updated the task description. (Show Details)Feb 3 2018, 3:13 AM

Issue "navtiming2.dns is usually zero" confirmed fixed.

Metric	navtiming.dns	navtiming2.dns
median	~24-34ms	~0.1-1.2ms
p95	~900ms	~250ms
p99	~4s	~4s
rate	60 - 110/min	120-200/min

The lower median/p95 is expected given we now include values that round down to 0.
The higher report rate is expected, given we now include zeros.
The report navtiming2.dns sample rate now matches the sample report rate navtiming.loadEventEnd, which also indicates that it isn't too high (e.g. fabricating zeros for non-navtiming clients, as we saw earlier by accident).

Krinkle mentioned this in T179619: Consider aligning Chrome firstPaint with navStart (or fetchStart).Feb 21 2018, 10:05 PM

Peter moved this task from To-do: Goals prioritized current Quarter to Doing (old) on the Performance-Team board.Mar 20 2018, 5:13 PM

Change 420813 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[mediawiki/extensions/NavigationTiming@master] Collect Navigation Timing gaps

https://gerrit.wikimedia.org/r/420813

gerritbot added a project: Patch-For-Review.Mar 20 2018, 6:47 PM

Change 420831 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[operations/puppet@production] Performance: Collect Navigation Timing gaps

https://gerrit.wikimedia.org/r/420831

Change 420813 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Collect Navigation Timing gaps

https://gerrit.wikimedia.org/r/420813

Krinkle updated the task description. (Show Details)Mar 20 2018, 8:58 PM

ReleaseTaggerBot added a project: MW-1.31-release-notes (WMF-deploy-2018-03-27 (1.31.0-wmf.27)).Mar 20 2018, 9:00 PM

Hmm something is wrong with the navtiming2, @Krinkle and me looked it through last night and I cannot make the numbers match? The idea with navtiming2 is that we should either we should send all metrics or no metrics at all.

For example looking at the code: https://github.com/wikimedia/mediawiki-extensions-NavigationTiming/blob/master/modules/ext.navigationTiming.js#L153-L170 it looks like we should always send redirecting and unload (if we don't have it, we send a 0).

But when I look at the graphs comparing the sample rate of those values it seems that they are drifting away from each other:

Screen Shot 2018-03-21 at 10.22.37 AM.png (552×2 px, 278 KB)

We have days where they are perfectly inline with each other but then have times where we have more redirects and sometimes more unload, so there must be a bug somewhere or am I doing something wrong?

Change 424595 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] navtiming: Remove broken 'rendering' and 'pageSpeed' metrics

https://gerrit.wikimedia.org/r/424595

In T104902#4067767, @Peter wrote:

looking at the code: https://github.com/wikimedia/mediawiki-extensions-NavigationTiming/blob/master/modules/ext.navigationTiming.js#L153-L170 it looks like we should always send redirecting and unload (if we don't have it, we send a 0).

Indeed.

But when I look at the graphs comparing the sample rate of those values it seems that they are drifting away from each other:

The server-side also doesn't provide a clue (src). No idea what the cause could be!

I'd recommend maybe digging through some eventlogging directly to see if we can find a beacon that has one of the values missing in the JSON beacon. Because even though our client is not supposed to send one without the other (given we set them unconditionally), the schema and our server-side processor does allow one to be missing (given all fields are optional). So that would be my only explanation - unless something gets corrupted in statsd or graphite.

I still don't know why they would be missing in the beacon, but maybe it'll explain itself once we find a complete sample.

Change 424595 merged by BBlack:
[operations/puppet@production] navtiming: Remove broken 'rendering' and 'pageSpeed' metrics

https://gerrit.wikimedia.org/r/424595

Krinkle removed a project: MW-1.31-release-notes (WMF-deploy-2018-03-27 (1.31.0-wmf.27)).Apr 16 2018, 8:15 PM

Change 420831 abandoned by Krinkle:
webperf: Collect Navigation Timing gaps

Reason:
To be resubmitted in performance/coal.git (code has diverged now).

https://gerrit.wikimedia.org/r/420831

I just want to confirm a couple of the assumptions that I've been making about this ticket:

Every incoming event is normalized such that fetchStart is 0; we ignore the unload delta and the redirecting delta if they exist
It's expected that if you sum these items, the value that you'll end up with is the same as loadEventEnd:
- dns
- dns-tcp gap (that is, the difference between domainLookupEnd and connectStart)
- tcp
- tcp-request gap (that is, the difference between connectEnd and requestStart)
- request
- response
- processing
- processing-onload gap (the difference between domComplete and loadEventStart)
- onload

Am I off on either of those?

Looks right to me!

In my own words:

The client collects all offsets relative to navigationStart. This is mostly for compatibility with navtiming1 in Graphite, and because it's hard to change in non-disruptive way also for continuity in Kafka/Coal/MySQL/Hadoop.
The processor (for navtiming2 namespace in Graphite) subtracts fetchStart from these offsets.
On the performance-metrics dashboard we stack all the navtiming2 deltas, except for those that capture events before fetchStart (unload, redirecting).

In T104902#4066435, @gerritbot wrote:

[operations/puppet@production] Performance: Collect Navigation Timing gaps
https://gerrit.wikimedia.org/r/420831

We haven't applied this yet. Would be interesting to see if this perhaps closes the "gap".

https://www.w3.org/TR/navigation-timing-2/ the spec doesn't really make any recommendation about gaps between these:

https://www.w3.org/TR/navigation-timing-2/#processing-model
https://www.w3.org/TR/navigation-timing-2/#dom-performancenavigationtiming

IMHO if things like connectEnd and requestStart were meant to be strictly equal, the spec would say so. The OS and the browser are constantly multitasking, I don't think it should be surprising that there are small gaps between these blocks.

There are situations defined by the spec where there are no gaps for some values, for instance if the DNS cache is warm and there's already a connection, domainLookupStart, domainLookupEnd, connectStart and connectEnd will all be equal to fetchStart. But in situations where "stuff happens", there's no guarantee that domainLookupEnd == connectStart.

Yeah, the spec allows for gaps to exist, and that's fine. If adding the collected gaps to the graph fixes the issue, then that would be a good outcome in my opinion.

As part of this task, though, we've questioned many parts of our infrastructure relating to navtiming and found (and fixed) numerous bugs that also contributed to the metrics not adding up in the graph.

The realisation that there are also gaps in the original data from Navigation Timing API has merely been the most recent finding. It's not guaranteed that it will be the last (although I certainly hope so).

Peter moved this task from Doing (old) to To-do: Goals prioritized current Quarter on the Performance-Team board.May 21 2018, 8:26 PM

Change 434439 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[performance/navtiming@master] Collect Navigation Timing gaps

https://gerrit.wikimedia.org/r/434439

Change 434439 merged by jenkins-bot:
[performance/navtiming@master] Collect Navigation Timing gaps

https://gerrit.wikimedia.org/r/434439

Here's the latest version: https://grafana.wikimedia.org/dashboard/db/performance-metrics?refresh=5m&panelId=30&fullscreen&orgId=1

We now collect gaps and use means for the metrics that we stack. Looks better but it's not 100% correct.

@Krinkle I tried to validate the example events with https://github.com/soulgalore/wikimedia-tools/blob/master/validate-navtiming/validate.js and I got two events there that doesn't match in total vs loadEventEnd but it could be that I just missed something in the validation. I'll dig into it tomorrow again.

In T104902#4257563, @Peter wrote:

@Krinkle I tried to validate the example events with https://github.com/soulgalore/wikimedia-tools/blob/master/validate-navtiming/validate.js and I got two events there that doesn't match in total vs loadEventEnd but it could be that I just missed something in the validation. I'll dig into it tomorrow again.

Nice! I spotted a minor difference with navtiming.py (comment). Might be nothing, but just in case :)

Krinkle removed a project: Patch-For-Review.Jun 5 2018, 5:05 PM

Use medians
Use navtiming2 values

Note that github's visualization also has gaps.

I've changed to medians and changed all navigation timing dashboards to use navtiming2 instead of navtiming.

Krinkle renamed this task from Refactor Navigation Timing gathering to produce reliable stackable measures to Refactor Navigation Timing gathering to produce reliable stackable measures (aka "frontend.navtiming2").Dec 5 2018, 8:41 PM

Krinkle added a project: Wikimedia-Performance-publish.

Change 477852 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/navtiming@master] Remove creation of frontend.navtiming metrics in favour of navtiming2

https://gerrit.wikimedia.org/r/477852

gerritbot added a project: Patch-For-Review.Dec 5 2018, 8:45 PM

Change 477852 merged by jenkins-bot:
[performance/navtiming@master] Remove creation of frontend.navtiming metrics in favour of navtiming2

https://gerrit.wikimedia.org/r/477852

Krinkle moved this task from Untriaged to Ready for write-up on the Wikimedia-Performance-publish board.Feb 18 2019, 10:34 PM

	F15942225: Screen Shot 2018-03-21 at 10.22.37 AM.png
	Mar 21 2018, 9:24 AM

	F10122714: Screen Shot 2017-10-10 at 19.00.41.png
	Oct 10 2017, 6:07 PM

	F10122688: Screen Shot 2017-10-10 at 19.00.39.png
	Oct 10 2017, 6:07 PM

	F10122694: Screen Shot 2017-10-10 at 19.00.40.png
	Oct 10 2017, 6:07 PM

	F4372627: Screen Shot 2016-08-17 at 22.52.08.png
	Aug 18 2016, 5:55 AM

	F2666721: Screen Shot 2015-10-08 at 11.38.06.png
	Oct 8 2015, 6:38 PM

	F2563591: Screen Shot 2015-09-10 at 04.40.23.png
	Sep 10 2015, 5:34 AM

	F2563815: Screen Shot 2015-09-10 at 06.06.57.png
	Sep 10 2015, 5:34 AM

Refactor Navigation Timing gathering to produce reliable stackable measures (aka "frontend.navtiming2")Closed, ResolvedPublicActions

Description

High-level plan

frontend.navtiming2 issues

Details

Related ObjectsSearch...

Event Timeline

Refactor Navigation Timing gathering to produce reliable stackable measures (aka "frontend.navtiming2")
Closed, ResolvedPublic
Actions

Related Objects
Search...