Page MenuHomePhabricator

Use Server-Timing to collect Varnish hit/miss information in NavigationTiming
Closed, ResolvedPublic

Description

The Server-Timing API: https://www.w3.org/TR/server-timing/ has been available for some time in several browsers. While it's base use case is to expose server-side times to the client, it can also be used to pass arbitrary values. Which makes, to my knowledge, the only way to actually read custom headers from JS.

This is how we could essentially collect the hit/miss information currently exposed in the x-cache-status header in NavigationTiming, if we were to serve is as a Server-Timing header instead (or in addition to it).

Event Timeline

Gilles renamed this task from User Server-Timing to collect Varnish hit/miss information in NavigationTiming to Use Server-Timing to collect Varnish hit/miss information in NavigationTiming.Oct 24 2018, 4:28 PM
Gilles triaged this task as Medium priority.Nov 8 2018, 10:23 AM

Change 472400 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/vagrant@master] Expose Varnish hit/miss information to JS via Server-Timing

https://gerrit.wikimedia.org/r/472400

Change 472400 merged by jenkins-bot:
[mediawiki/vagrant@master] Expose Varnish hit/miss information to JS via Server-Timing

https://gerrit.wikimedia.org/r/472400

Change 472401 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Expose Varnish X-Cache-Status via Server-Timing

https://gerrit.wikimedia.org/r/472401

Change 472401 merged by Ema:
[operations/puppet@production] Expose Varnish X-Cache-Status via Server-Timing

https://gerrit.wikimedia.org/r/472401

Change 474661 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Record Server-Timing entries

https://gerrit.wikimedia.org/r/474661

Change 474661 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Record Server-Timing entries

https://gerrit.wikimedia.org/r/474661

Doesn't work, I see nothing in hive and this happens a bunch:

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.11.30/eventlogging?id=AWdkbJZQxYA-x_UccQ9E&_g=h@44136fa

I'm guessing the performance entry object might be read-only or something, and adding pageviewToken to it fails?

Change 476834 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Server timing: combine event and pageviewToken properly

https://gerrit.wikimedia.org/r/476834

Change 476841 had a related patch set uploaded (by Gilles; owner: Gilles):
[analytics/refinery@master] Add ServerTiming to EventLogging whitelist

https://gerrit.wikimedia.org/r/476841

Doesn't work, I see nothing in hive and this happens a bunch:

… a bunch being ~180,000 validation errors in the last 24 hours (per https://logstash.wikimedia.org/goto/7edd6138de07ea2167cd7c794ad448b4) 😉

rENTI278176702293: Server timing: combine event and pageviewToken properly should fix the issue.

[mediawiki/extensions/NavigationTiming@master] Server timing: combine event and pageviewToken properly
https://gerrit.wikimedia.org/r/476834

I suspect the reason the previous code didn't work because the entry object is immutable. That would explain the property being missing once received server-side.

However, I'm somewhat surprised and concerned we may run into the same mistake again if we don't follow up a little bit to understand what happened,

In particular:

  • If it is immutable (can be verified in the browser console using Object.isFrozen() and related methods), we should consider updating the tests to match this by using Object.freeze(). If it isn't frozen, it might be "sealed" which has a subset of the behaviours.
  • If the event was sent to the server and found with validation errors in Logstash, then it is unlikely to have been a case of frozen/sealed/read-only or something like that because NavTiming's code is in strict mode, which means an exception is thrown when attempting to set the property like we did. This means the function would have been aborted, and would not have transmitted the token-less object to the server to find a validation error for in Logstash.

Based on this last point alone, I would guess that maybe the token property did get created but was assigned null somehow (which would point at a possible crypto failure to look into), or that maybe strict mode got disabled somehow (which would point to a NavTiming JS failure to look into).

Change 476834 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Server timing: combine event and pageviewToken properly

https://gerrit.wikimedia.org/r/476834

The object isn't sealed, isn't frozen, is extensible, and if I output it after modifying it, it has the new property. But the new property doesn't survive JSON.stringify, because the new property isn't included by the object's toJSON():

Capture d'écran 2018-12-10 11.42.29.png (224×831 px, 49 KB)

Could be a browser bug? What do you think?

The spec states it should use the default toJSON behaviour, but reading through that other spec explaining what the default does, I don't see anything suggesting it should stick to the "original" set of object keys: https://w3c.github.io/server-timing/#dom-performanceservertiming

The data is making it into the ServerTiming schema as expected now, as verified on Hive.

Change 476841 merged by Gilles:
[analytics/refinery@master] Add ServerTiming to EventLogging whitelist

https://gerrit.wikimedia.org/r/476841