This will likely require merging the ServerTiming EventLogging schema into the https://meta.wikimedia.org/wiki/Schema:NavigationTiming one, so that the navtiming daemon can get this data in the same record it currently collects the responseStart metric from.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • ema | T264398 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) | |||
Resolved | • Gilles | T264987 Add cache response type and response size as new dimensions to navtiming_responsestart_by_host_seconds prometheus metric |
Event Timeline
Change 632879 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Fold cache response type data into NavigationTiming
Change 632883 had a related patch set uploaded (by Gilles; owner: Gilles):
[performance/navtiming@master] Add cache response type as dimension to per-host metric
@ema since these new dimensions are labels, for transfersize we're going to need to come up with buckets ourselves. What buckets would you be interested in tracking?
Looking at October traffic, these are the percentiles I'm seeing for transferSize from RUM data (2673027 samples), in bytes:
p10 | 9478 |
p50 | 19431 |
p75 | 33920 |
p90 | 61715 |
p95 | 88658 |
Based on the October percentiles you've mentioned it seems to me that it could be interesting to define the following buckets:
0 - 10k
10k - 20k
20k - 30k
30k - 60k
60k - inf
Change 632883 merged by jenkins-bot:
[performance/navtiming@master] Add cache response type as dimension to per-host metric
Change 634228 had a related patch set uploaded (by Gilles; owner: Gilles):
[performance/navtiming@master] Add transfer size buckets as new dimension by host
Change 634228 merged by jenkins-bot:
[performance/navtiming@master] Add transfer size buckets as new dimension by host
Change 632879 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Fold cache response type data into NavigationTiming
The new cacheReponseType field is being collected correctly in Hive:
SELECT COUNT(*), event.cacheResponseType FROM event.navigationtiming WHERE year = 2020 AND month = 11 AND day = 30 GROUP BY event.cacheResponseType; _c0 cacheresponsetype 104711 NULL 158735 hit-front 3508 hit-local 78574 miss 20267 pass
NULL responses are from browsers that don't support Server Timing.
And as expected the ServerTiming schema no longer collects data:
SELECT COUNT(*) FROM event.servertiming WHERE year = 2020 AND month = 11 AND day = 30; _c0 10
Those 10 hits are probably stragglers from people with old JS cached (eg. frozen browser tab reawakened).
Change 644201 had a related patch set uploaded (by Gilles; owner: Gilles):
[analytics/refinery@master] ServerTiming has been folded into NavigationTiming
I've added cache response type to the per host dashboard: https://grafana-rw.wikimedia.org/d/M7xQ_BeWk/response-time-by-host
I can't manage to add transfer size to that dashboard, for some reason the time-shifted graphs don't work with it. Maybe a Grafana bug?
Change 644201 merged by Mforns:
[analytics/refinery@master] ServerTiming has been folded into NavigationTiming