Refactor NavigationTiming extension so that it can be used to oversample based on criteria
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Imarlier
	Nov 27 2017, 4:05 PM

Description

The NavigationTiming extension currently has hardcoded logic to oversample FirstPaint data from Asia - this is being used to collect extra data in preparation for Singapore going live, so that we can more clearly see the effect of this change. Collected data is submitted to a statsd metric name.

It would be useful to be able to oversample NavTiming data for additional criteria as well, and to be able to configure this without having to write new code each time. A specific example where this would have been useful is the release of FF57, as we would be able to collect a fairly large data set on actual end user performance relatively quickly, without polluting our general sample.

Things that are needed in ext.NavigationTiming.js:

Enhance the Event object so that it includes a boolean value indicating whether it is an oversample
Add a generic method that checks whether oversampling is enabled at all
Add a method that checks whether Geo oversampling (country or region) is enabled
Add a method that checks whether User-Agent oversampling is enabled
Emit non-oversampled event if needed
Emit oversampled event if needed. This does mean that certain events will be collected twice. They will end up in different locations, so this is okay.

Things that are needed in the NavigationTiming schema:

Add the oversample boolean (default: false)

Things that are needed in webperf.py:

Check the oversample boolean, if it exists, and direct the collected event appropriately.

Potential ramifications/drawbacks:

Depending on oversample rate, this could increase the number of items on the NavigationTiming queue in Kafka by quite a bit. Need to check with the Analytics team to ensure that this won't be an issue.

Checklist:

Revise schema for oversampling. – https://meta.wikimedia.org/w/index.php?title=Schema:NavigationTiming&oldid=17490599
Handle oversampling in event processor. – [operations/puppet] https://gerrit.wikimedia.org/r/#/c/394375/
Implement client support for oversampling. – [mediawiki/extensions/NavigationTiming] https://gerrit.wikimedia.org/r/#/c/394298/
Enable "NavigationTiming" Monolog channel in wmf-config. – [operations/mediawiki-config] ...
Enable first use of NavTiming oversampling in wmf-config (Asia/Chrome?). – [operations/mediawiki-config] ...

Details

Subject	Repo	Branch	Lines +/-
ext.NavigationTiming: Add integration test for oversampling feature	mediawiki/extensions/NavigationTiming	master	+69 -6
ext.NavigationTiming: Allow oversampling based on geography or user agent	mediawiki/extensions/NavigationTiming	master	+650 -338
webperf: Handle oversamples differently than regular samples	operations/puppet	production	+294 -2
webperf.py: Handle oversamples differently than regular samples	operations/puppet	production	+294 -2

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• Imarlier	T181413 Refactor NavigationTiming extension so that it can be used to oversample based on criteria
		Declined		None	T181956 Collect extra data and oversample views with higher page load times

Event Timeline

• Imarlier created this task.Nov 27 2017, 4:05 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 27 2017, 4:05 PM

--> NavigationTiming

Depending on oversample rate, this could increase the number of items on the NavigationTiming queue in Kafka by quite a bit. Need to check with the Analytics team to ensure that this won't be an issue.

Unless we are talking about 100s of events per sec this should be fine.

Maybe Wikimediaevents is of use to manage smaple ratios? https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents

Change 394298 had a related patch set uploaded (by Imarlier; owner: Imarlier):
[mediawiki/extensions/NavigationTiming@master] ext.NavigationTiming: Allow oversampling based on geography or user agent

https://gerrit.wikimedia.org/r/394298

gerritbot added a project: Patch-For-Review.Nov 30 2017, 1:06 PM

Change 394375 had a related patch set uploaded (by Imarlier; owner: Imarlier):
[operations/puppet@production] webperf.py: Handle oversamples differently than regular samples

https://gerrit.wikimedia.org/r/394375

• Imarlier created subtask T181956: Collect extra data and oversample views with higher page load times.Dec 4 2017, 2:36 AM

Krinkle triaged this task as Medium priority.Dec 15 2017, 1:54 AM

Krinkle updated the task description. (Show Details)

Change 402867 had a related patch set uploaded (by Imarlier; owner: Imarlier):
[operations/puppet@production] modules/webperf: handle oversamples differently than regular samples

https://gerrit.wikimedia.org/r/402867

@Krinkle @Gilles Opened a new patchset with my changes to webperf.py, and to the test fixtures. Not sure what I did to my original branch, but when I tried to rebase everything went pear-shaped. Easier to just cherry-pick over and resubmit.

Timo, your comments from the original review have been addressed in this patchset.

Change 394375 abandoned by Imarlier:
webperf.py: Handle oversamples differently than regular samples

Reason:
I dunno what I did to my branch, but whatever it was broke everything. Opened a new patchset for these changes.

https://gerrit.wikimedia.org/r/394375

This will be awesome! I miss docs describing limits and how we do it (but we can add that when it's merged). Like how should we keep track of different oversamplings running, can you run multiple at a time, what should you think about etc.

In T181413#3885736, @Peter wrote:

This will be awesome! I miss docs describing limits and how we do it (but we can add that when it's merged). Like how should we keep track of different oversamplings running, can you run multiple at a time, what should you think about etc.

I was planning to add docs as soon as the change is merged.

There actually is an issue with having multiple oversample groups running at one time. Let's say that we were oversampling a country 'XX' at 1-out-of-100 requests, and oversampling a user agent 'Firefox 100' at 1-out-of-10. If more than 1% of the users in countries 'XX' were using the new browser, then our sample set from country 'XX' would include more users than it should (and those users are going to be more similar than they otherwise would be). That's worth thinking about, but for the moment I think I'm happy to punt on it.

(The right approach is probably to do a separate check for whether to oversample based on geo and on UA, and to indicate in the event why the oversample occurred - we can tag/aggregate based on that after receiving)

@Gilles @aaron @Peter Would one of you be able to take a look at https://gerrit.wikimedia.org/r/#/c/394298/?

Change 402867 merged by Dzahn:
[operations/puppet@production] webperf: Handle oversamples differently than regular samples

https://gerrit.wikimedia.org/r/402867

deployed on hafnium.eqiad.wmnet

Thanks, @Dzahn - verified in prod, all is well. Much appreciated!

Change 394298 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] ext.NavigationTiming: Allow oversampling based on geography or user agent

https://gerrit.wikimedia.org/r/394298

ReleaseTaggerBot added a project: MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)).Feb 2 2018, 10:00 PM

• Imarlier closed this task as Resolved.Feb 5 2018, 6:13 PM

Krinkle removed a project: Patch-For-Review.Feb 5 2018, 8:29 PM

Change 416627 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] [WIP] Add integration test for oversampling feature

https://gerrit.wikimedia.org/r/416627

gerritbot added a project: Patch-For-Review.Mar 6 2018, 2:13 AM

Change 416627 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] ext.NavigationTiming: Add integration test for oversampling feature

https://gerrit.wikimedia.org/r/416627

ReleaseTaggerBot edited projects, added MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)); removed MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)).Mar 6 2018, 7:00 PM

larissagaulia closed subtask T181956: Collect extra data and oversample views with higher page load times as Declined.Jan 24 2023, 1:34 PM

Maintenance_bot removed a project: Patch-For-Review.Jan 24 2023, 2:31 PM

Refactor NavigationTiming extension so that it can be used to oversample based on criteriaClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Refactor NavigationTiming extension so that it can be used to oversample based on criteria
Closed, ResolvedPublic
Actions

Related Objects
Search...