Page MenuHomePhabricator

Use Server-Timing to collect Varnish hit/miss information in NavigationTiming
Closed, ResolvedPublic

Description

The Server-Timing API: https://www.w3.org/TR/server-timing/ has been available for some time in several browsers. While it's base use case is to expose server-side times to the client, it can also be used to pass arbitrary values. Which makes, to my knowledge, the only way to actually read custom headers from JS.

This is how we could essentially collect the hit/miss information currently exposed in the x-cache-status header in NavigationTiming, if we were to serve is as a Server-Timing header instead (or in addition to it).

Details

Related Gerrit Patches:
analytics/refinery : masterAdd ServerTiming to EventLogging whitelist
mediawiki/extensions/NavigationTiming : masterServer timing: combine event and pageviewToken properly
mediawiki/extensions/NavigationTiming : masterRecord Server-Timing entries
operations/puppet : productionExpose Varnish X-Cache-Status via Server-Timing
mediawiki/vagrant : masterExpose Varnish hit/miss information to JS via Server-Timing

Event Timeline

Gilles created this task.Oct 24 2018, 4:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2018, 4:28 PM
Gilles renamed this task from User Server-Timing to collect Varnish hit/miss information in NavigationTiming to Use Server-Timing to collect Varnish hit/miss information in NavigationTiming.Oct 24 2018, 4:28 PM
Peter added a subscriber: Peter.Oct 29 2018, 4:19 PM
Gilles claimed this task.Oct 29 2018, 9:27 PM
Gilles moved this task from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Gilles triaged this task as Medium priority.Nov 8 2018, 10:23 AM

Change 472400 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/vagrant@master] Expose Varnish hit/miss information to JS via Server-Timing

https://gerrit.wikimedia.org/r/472400

Change 472400 merged by jenkins-bot:
[mediawiki/vagrant@master] Expose Varnish hit/miss information to JS via Server-Timing

https://gerrit.wikimedia.org/r/472400

Change 472401 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Expose Varnish X-Cache-Status via Server-Timing

https://gerrit.wikimedia.org/r/472401

Change 472401 merged by Ema:
[operations/puppet@production] Expose Varnish X-Cache-Status via Server-Timing

https://gerrit.wikimedia.org/r/472401

Change 474661 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Record Server-Timing entries

https://gerrit.wikimedia.org/r/474661

Change 474661 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Record Server-Timing entries

https://gerrit.wikimedia.org/r/474661

Doesn't work, I see nothing in hive and this happens a bunch:

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.11.30/eventlogging?id=AWdkbJZQxYA-x_UccQ9E&_g=h@44136fa

I'm guessing the performance entry object might be read-only or something, and adding pageviewToken to it fails?

Change 476834 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Server timing: combine event and pageviewToken properly

https://gerrit.wikimedia.org/r/476834

Change 476841 had a related patch set uploaded (by Gilles; owner: Gilles):
[analytics/refinery@master] Add ServerTiming to EventLogging whitelist

https://gerrit.wikimedia.org/r/476841

phuedx added a subscriber: phuedx.Dec 6 2018, 11:58 AM

Doesn't work, I see nothing in hive and this happens a bunch:

… a bunch being ~180,000 validation errors in the last 24 hours (per https://logstash.wikimedia.org/goto/7edd6138de07ea2167cd7c794ad448b4) 😉

rENTI278176702293: Server timing: combine event and pageviewToken properly should fix the issue.

Krinkle added a subscriber: Krinkle.Dec 8 2018, 7:59 PM

[mediawiki/extensions/NavigationTiming@master] Server timing: combine event and pageviewToken properly
https://gerrit.wikimedia.org/r/476834

I suspect the reason the previous code didn't work because the entry object is immutable. That would explain the property being missing once received server-side.

However, I'm somewhat surprised and concerned we may run into the same mistake again if we don't follow up a little bit to understand what happened,

In particular:

  • If it is immutable (can be verified in the browser console using Object.isFrozen() and related methods), we should consider updating the tests to match this by using Object.freeze(). If it isn't frozen, it might be "sealed" which has a subset of the behaviours.
  • If the event was sent to the server and found with validation errors in Logstash, then it is unlikely to have been a case of frozen/sealed/read-only or something like that because NavTiming's code is in strict mode, which means an exception is thrown when attempting to set the property like we did. This means the function would have been aborted, and would not have transmitted the token-less object to the server to find a validation error for in Logstash.

Based on this last point alone, I would guess that maybe the token property did get created but was assigned null somehow (which would point at a possible crypto failure to look into), or that maybe strict mode got disabled somehow (which would point to a NavTiming JS failure to look into).

Change 476834 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Server timing: combine event and pageviewToken properly

https://gerrit.wikimedia.org/r/476834

Gilles added a comment.EditedDec 10 2018, 10:49 AM

The object isn't sealed, isn't frozen, is extensible, and if I output it after modifying it, it has the new property. But the new property doesn't survive JSON.stringify, because the new property isn't included by the object's toJSON():

Could be a browser bug? What do you think?

The spec states it should use the default toJSON behaviour, but reading through that other spec explaining what the default does, I don't see anything suggesting it should stick to the "original" set of object keys: https://w3c.github.io/server-timing/#dom-performanceservertiming

Gilles closed this task as Resolved.Dec 19 2018, 2:59 PM

The data is making it into the ServerTiming schema as expected now, as verified on Hive.

Change 476841 merged by Gilles:
[analytics/refinery@master] Add ServerTiming to EventLogging whitelist

https://gerrit.wikimedia.org/r/476841

phuedx removed a subscriber: phuedx.Jan 23 2019, 12:29 AM