Page MenuHomePhabricator

Collect First Input Delay
Closed, ResolvedPublic

Description

Google seems to be judging us by FID, as it's found in CrUX and now in their Search Console Speed Report. They are now pushing the newer Total Blocking Time, but based on its existing prevalence, it seems likely that search ranking might take FID into account and not TBT yet.

Event Timeline

If we just care about Chrome we can use the performance observer for
first-input, I've used it here - it supports buffered too so we don't have to add things in head.

It was released in 77 if that's enough for us, then at least we can start tracking it there?

Oh ok, perfect. Yes, let's do that.

Gilles renamed this task from We should track First Input Delay (FID) in our RUM metrics to Collect First Input Timing.Dec 16 2019, 1:41 PM
Gilles updated the task description. (Show Details)
Gilles triaged this task as High priority.
Gilles renamed this task from Collect First Input Timing to Collect First Input Delay.Feb 25 2020, 2:46 PM
Gilles updated the task description. (Show Details)

Change 574770 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Collect First Input Timing

https://gerrit.wikimedia.org/r/574770

@Krinkle any chance you could take a look at this navtiming patch this week? It's one of my quarterly goals. Pretty straightforward stuff.

Change 574770 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Collect First Input Timing

https://gerrit.wikimedia.org/r/574770

Change 586318 had a related patch set uploaded (by Gilles; owner: Gilles):
[performance/navtiming@master] Surface First Input Delay as a metric

https://gerrit.wikimedia.org/r/586318

Change 586318 merged by jenkins-bot:
[performance/navtiming@master] Surface First Input Delay as a metric

https://gerrit.wikimedia.org/r/586318

Assigning it to you @dpifke so you run through the steps of deploying a navtiming daemon update:

https://wikitech.wikimedia.org/wiki/Performance/Runbook/Webperf-processor_services

First, to test the change on beta, once it's deployed there, traffic to beta wikis has its navtiming data go to https://grafana-labs.wikimedia.org/?orgId=1

You should be able to create a dashboard there capturing the new metrics. Since it's graphite, it will only exist once there is actually a record of it. Creating such a record manually requires... luck because the Navigation Timing sampling rate on beta is 10, meaning that only 1 every 10 requests gets performance metrics collected. There's no manual option to force that.

I tried a bunch of times on beta simple english and got a real beacon call that you can trigger manually:

https://simple.wikipedia.beta.wmflabs.org/beacon/event?%7B%22event%22%3A%7B%22pageviewToken%22%3A%22bcde71d47dffe24dc790%22%2C%22processingStart%22%3A89774%2C%22processingEnd%22%3A89774%2C%22name%22%3A%22mousedown%22%2C%22startTime%22%3A89770%2C%22duration%22%3A16%2C%22FID%22%3A4%7D%2C%22revision%22%3A19842486%2C%22schema%22%3A%22FirstInputTiming%22%2C%22webHost%22%3A%22simple.wikipedia.beta.wmflabs.org%22%2C%22wiki%22%3A%22simplewiki%22%7D;

Changing the pageviewToken to some other random string might be necessary for it to work again. Hitting that URL should result in a record in the FirstInputTiming EventLogging schema that the daemon should capture.

Once you see that the whole pipeline works on beta with the new version of the daemon, including capturing this new FID metric, you can deploy the update on production, where you can verify on Grafana that metrics aren't interrupted after the update. As for FID, it will only work in prod for wikis on 1.35.0-wmf.26 (as shown above by ReleaseTagBot). Which enwiki already is, according to https://en.wikipedia.org/wiki/Special:Version which means that the new FID metric should appear pretty quickly thanks to organic traffic.

Oh and one last thing, when you do restart the production service, please leave a relevant Server Admin Log message about it. You do that with the !log command in the #wikimedia-operations IRC channel.

Deployed to beta cluster, and confirmed that new message type is being processed:

dpifke@deployment-webperf11:~$ curl -s http://localhost:9230 | grep FirstInputTiming | grep -v '^#'
webperf_handled_messages{schema="FirstInputTiming"} 2.0
webperf_latest_handled_time_seconds{schema="FirstInputTiming"} 1586374028.348369

...and also that frontend.firstinputtiming.fid.count is non-zero in Grafana (via Graphite).

Deploying to production next.

Mentioned in SAL (#wikimedia-operations) [2020-04-08T19:44:22Z] <dpifke@deploy1001> Started deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091

Mentioned in SAL (#wikimedia-operations) [2020-04-08T19:44:28Z] <dpifke@deploy1001> Finished deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091 (duration: 00m 05s)

Change 587914 had a related patch set uploaded (by Gilles; owner: Gilles):
[analytics/refinery@master] Retain FirstInputTiming and NavigationTiming hardwareConcurrency

https://gerrit.wikimedia.org/r/587914

Change 587914 merged by Gilles:
[analytics/refinery@master] Retain FirstInputTiming and NavigationTiming hardwareConcurrency

https://gerrit.wikimedia.org/r/587914