Page MenuHomePhabricator

Collect first input delay
Open, Needs TriagePublic

Description

Let's start collecting First Input Delay again in the Navigation Timing Extension and push the data to Prometheus. We had the data for a while and then remove it, but I think it would add value to have it so we can correlate it to the metric we see in Google Search Console.

  1. Collect the metrics in the Navigation Timing extension and send a new event when we have it
  2. Create a new schema and make sure we collect the same things as the navtiming schema so we can create the same labels in Prometheus.
  3. Update navtiming.py and push the metric to Prometheus.
  4. Create a graph/dashboard in Grafana to keep track of the metric.

Event Timeline

Change 902680 had a related patch set uploaded (by Lgaulia; author: Lgaulia):

[mediawiki/extensions/NavigationTiming@master] Add First Input Delay metric

https://gerrit.wikimedia.org/r/902680

Change 902693 had a related patch set uploaded (by Lgaulia; author: Lgaulia):

[schemas/event/secondary@master] Add first input delay schema

https://gerrit.wikimedia.org/r/902693

Change 907871 had a related patch set uploaded (by Lgaulia; author: Lgaulia):

[schemas/event/secondary@master] Add first input delay schema

https://gerrit.wikimedia.org/r/907871

Change 902693 abandoned by Lgaulia:

[schemas/event/secondary@master] Add first input delay schema

Reason:

npm build conflicts

https://gerrit.wikimedia.org/r/902693

Change 907871 merged by jenkins-bot:

[schemas/event/secondary@master] Add first input delay schema

https://gerrit.wikimedia.org/r/907871

Change 902680 merged by jenkins-bot:

[mediawiki/extensions/NavigationTiming@master] Add First Input Delay metric

https://gerrit.wikimedia.org/r/902680

Change 908794 had a related patch set uploaded (by Lgaulia; author: Lgaulia):

[performance/navtiming@master] Collect first input delay and send it to prometheus

https://gerrit.wikimedia.org/r/908794

Change 908794 merged by jenkins-bot:

[performance/navtiming@master] Collect first input delay and send it to prometheus

https://gerrit.wikimedia.org/r/908794

Change 917932 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[mediawiki/extensions/NavigationTiming@master] Fix oversample naming to match schema.

https://gerrit.wikimedia.org/r/917932

Change 917736 had a related patch set uploaded (by Krinkle; author: Phedenskog):

[mediawiki/extensions/NavigationTiming@wmf/1.41.0-wmf.8] Fix oversample naming to match schema.

https://gerrit.wikimedia.org/r/917736

Change 917932 merged by jenkins-bot:

[mediawiki/extensions/NavigationTiming@master] Fix oversample naming to match schema.

https://gerrit.wikimedia.org/r/917932

Change 918348 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[operations/mediawiki-config@master] Enable First Input Delay events.

https://gerrit.wikimedia.org/r/918348

Change 918348 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable First Input Delay events.

https://gerrit.wikimedia.org/r/918348

Mentioned in SAL (#wikimedia-operations) [2023-05-17T22:15:03Z] <krinkle@deploy1002> Synchronized wmf-config/: T332012 (duration: 06m 51s)

Good morning. Events have been arriving for this new schema since last night as expected, but there seems to be some kind of compatability issue with the schema.

I'm on Ops Week for the Data Engineering team today and I've become aware of two issues that I thought I should share with you:

1: We have a number of eventgate validation errors for this schema: See: https://logstash.wikimedia.org/goto/cd1a0d311bc514f553006e12d1467ec1
{F37008883,width=50%}

The errors all appear to state that .event.oversampleReason' should be string

2: We have received some errors from the refine job that processes the raw data after it has been sent to HDFS. Once again, these seem to be related to the schema in some way, but the error is not specific about this. It just says Original exception: org.wikimedia.eventutilities.core.json.JsonLoadingException: Failed reading JSON/YAML data from ?action=jsonschema&formatversion=2&format=json&title=FirstInputDelay

Please let me know if I can help with further resolution. I'll tag @Ottomata for good measure.

Hi all,

It looks like you are trying to add a new 'legacy' schema. legacy schemas are ones that are migrated from the old EventLogging on metawiki schemas. While it should be possible to do what you are doing, it means that we will have to add a special case for this schema to mark it as 'migrated to event platform' so that the proper ingestion jobs ingest it. Right now, the jobs are looking for this stream's schema at https://meta.wikimedia.org/w/api.php?action=jsonschema&formatversion=2&format=json&title=FirstInputDelay.

Instead of making a new legacy schema, can you make a new event platform based schema in analytics/, perhaps analytics/mediawiki/browser_first_input_delay or something?

See: https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To

Mentioned in SAL (#wikimedia-operations) [2023-05-18T12:46:51Z] <otto@deploy1002> Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 07m 00s)

Mentioned in SAL (#wikimedia-operations) [2023-05-18T12:59:18Z] <otto@deploy1002> Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 06m 19s)

Update: First Input Delay is going to be deprecated in favour of Interaction to Next Paint.