Record interaction to next paint
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Peter
	Jan 18 2023, 10:12 AM

Description

One of the upcoming metrics in Core Web Vitals that we miss out on today is interaction to next paint, it is a better version to measure lag for the users than first input delay. We should start measuring it so we can keep track of it.

Instead of storing the data in the navtiming schema as in T264032 and sending the data when we fetch Navigation Timing data, we want to push it throughout the page lifecycle to a new schema (like we done before with some other metrics).

The instrumentation of getting the interaction to next paint is a little bit more complicated than the other metrics we collect, we can find some inspiration on how to do it in https://github.com/GoogleChrome/web-vitals/blob/main/src/onINP.ts

To collect the metric we need to do a couple of things (you can see the full picture at https://wikitech.wikimedia.org/wiki/Performance#/media/File:WMF_Performance_Team_infrastructure_2022.png):

Add the collection of the actual metric in the navigation timing extension https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/NavigationTiming/+/refs/heads/master - you need to pull up Mediawiki and do the changes in the extension, add tests for it and run the tests.

Then we need to make sure that the data is stored in a new schema https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/legacy

And then we need to take care of data when it arrives and send it to Graphite/Prometheus. That happens in navtiming.py: https://gerrit.wikimedia.org/r/plugins/gitiles/performance/navtiming/

When the data has started to arrive we can make a new dashboard/graph in Grafana where we can look at the new metric.

Related Objects
Search...

Status	Assigned	Task
Open	None	T319329 Expand navigation timing metrics to include user experience metrics and modernise navigation timing
Open	None	T327246 Record interaction to next paint
Open	None	T359286 Collect data from the Long Animation Frame API

Event Timeline

Peter created this task.Jan 18 2023, 10:12 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2023, 10:12 AM

larissagaulia added a project: good first task.Jan 20 2023, 8:16 AM

larissagaulia moved this task from Inbox, needs triage to To-do: Goals, prioritized next 4 Quarters on the Performance-Team board.

larissagaulia edited projects, added MediaWiki-Engineering-Group-onboarding; removed good first task.Jan 26 2023, 2:23 PM

We can follow the pattern on how we implemented First Input delay in https://phabricator.wikimedia.org/T238091

Peter assigned this task to larissagaulia.Mar 7 2023, 1:38 PM

Google announced that INP will be the new metric in 2024:
https://web.dev/inp-cwv/

I propose we start collect it ASAP so we can create a plan on how to make the metric better.

I've asked on the Performance Slack channel if there's any work for simplifying how to get INP in Chrome before 2024 (when it becomes a Google Core Web Vitals), else this is the current implementation that we need to work from: https://github.com/GoogleChrome/web-vitals/blob/main/src/onINP.ts

I've seen the implementation changes a little over time, so lets aim for the newest version.

@Krinkle do you see anyway we can do this and do not need to massage the data when it arrives to the server? When is the latest we can beacon back data today? I'm thinking maybe it's ok to loose some data just to get this going since looking at Crux data this is the metrics where we have most need for improvement.

So https://chromestatus.com/feature/5690553554436096 just finished origin trial, maybe we should for it to reach Chrome and then collect it only for Chrome even though it sucks, it would help us to minimise the code for collecting INP and keep track of it ourselves.

The difficult part here is that browsers inherently don't support right now a reliable "at most once" callback to compute and beacon something that is both widely supported and also relatively late/close to the end of a browser's tab lifetime. This is basically the classic "onBeforeUnload" problem and the Page Lifecycle API reasoning, which have moved further away from this classic model. That's good, in that it is honest and accurately represents how mobile browsers work, but also means it's not going to get easier. The first and only thing I see in Web APIs that would make this possible is the Pending Beacon API, but that's still a draft.

What we have today is the Page Visiblity API which fairly accurately tells you on both mobile and desktop that a pageview is no longer in the foreground (e.g. switch or minimise apps, or switch tabs). This doesn't mean that it is closed, so it can be switched back and continue its life again later.

With that, I see two practical options today:

Find a clever way to structure or process the data.

This means we can send the (potentially incomplete) data we have whenever the tab gets hidden. This is how Analytics' Session Length metric works today. It periodically indicates that a pageview existed that lasted upto N minutes. Because earlier events are naturally a subset of the other, there is a way to process this data such that the fact that the same tab sends multiple beacons does not cause problems with the data.

For us this could mean we send the INP metric from pagehide associated with a pageviewID and then in the backend require some periodic processing to ignore earlier beacons from the same client. This however cannot be done in realtime because once increments are aggregated and sent to Prometheus, we can't undo that. And unline session length data, this timing information is bucketed in ways we can't accumulate in a way that is still accurate.

Measure only upto the first tab hide.

This would mean we don't measure the same way, so rather than building up and tuning the INP percentile in the browser tab until it is officially closed (like Chrome internals do), we would only measure until the first tab hide. This means we may miss some datapoints from people that switch tabs a lot or for any reason did not do the "main" action until after switching back. But maybe it's good enough, and would support real-time processing, and would avoid attaching sensitive pageViewIDs to this data.

Peter removed larissagaulia as the assignee of this task.Jun 19 2023, 6:58 PM

Peter added a subscriber: larissagaulia.

Krinkle removed a project: Performance-Team.Aug 17 2023, 1:26 PM

Krinkle unsubscribed.

Peter mentioned this in T358380: [3 days] Interaction to Next Paint (INP) Core Web Vital is scored as "Needs Improvement" or "Poor" for Mobile users on Desktop.Mar 4 2024, 2:38 PM

Jdlrobson added a project: Web-Team-Backlog.Mar 5 2024, 7:57 PM

Jdlrobson added a project: Web Team Essential Work 2024.Mar 12 2024, 5:47 PM

Jdlrobson moved this task from Backlog to Q3 (Jan-April) on the Web Team Essential Work 2024 board.Mar 12 2024, 5:49 PM

Jdlrobson moved this task from Q3 (Jan-April) to Q4 (April-July) on the Web Team Essential Work 2024 board.Mar 13 2024, 12:56 AM