Page MenuHomePhabricator

Add ability to oversample specific pages
Closed, ResolvedPublic

Description

As part of the survey, it would be interesting in order to reduce noise in the data to be able to oversample specific articles, where NavTiming + survey would be recorded more often than others.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 18 2018, 3:15 PM
Gilles added a parent task: Restricted Task.Jun 18 2018, 3:15 PM
Gilles triaged this task as Normal priority.Jun 22 2018, 4:51 PM
Vvjjkkii renamed this task from Add ability to oversample on specific articles to 8paaaaaaaa.Jul 1 2018, 1:03 AM
Vvjjkkii removed Gilles as the assignee of this task.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 8paaaaaaaa to Add ability to oversample on specific articles.Jul 2 2018, 4:33 AM
CommunityTechBot assigned this task to Gilles.
CommunityTechBot lowered the priority of this task from High to Normal.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Gilles renamed this task from Add ability to oversample on specific articles to Add ability to oversample specific pages.Sep 4 2018, 1:38 PM

Change 457900 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Add ability to oversample specific pages

https://gerrit.wikimedia.org/r/457900

Change 457900 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Add ability to oversample specific pages

https://gerrit.wikimedia.org/r/457900

Change 472433 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Only ever inject the performance survey once

https://gerrit.wikimedia.org/r/472433

Change 472433 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Only ever inject the performance survey once

https://gerrit.wikimedia.org/r/472433

Change 478656 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Oversample performance survey on specific ruwiki articles

https://gerrit.wikimedia.org/r/478656

Change 478656 merged by jenkins-bot:
[operations/mediawiki-config@master] Oversample performance survey on specific ruwiki articles

https://gerrit.wikimedia.org/r/478656

Mentioned in SAL (#wikimedia-operations) [2018-12-10T12:56:34Z] <gilles@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T187299 T197607 Oversample performance survey on specific ruwiki articles (duration: 00m 46s)

I'm not sure that it worked or that the 5 articles picked made a big difference, there's no visible uptake in ruwiki survey responses:

Gilles closed this task as Resolved.Dec 11 2018, 10:21 AM

NavigationTiming doesn't store article id, but it stores the revision id. For a popular article like https://ru.wikipedia.org/wiki/Россия which is part of the oversampled articles, it gives a recognisable revision id. Last edit was on the 8th, with revision id 96715436. Let's look at how often navtiming was recorded for it on the 9th and on the 10th once per-page oversampling was enabled (mid-day):

Dateruwiki navtiming with revId = 96715436
Dec 95
Dec 1037

Which means that the oversampling is working, but the distribution of traffic has such a long tail that boosting these top articles 10x hardly makes a dent in the total sampling numbers.

If we want to speed up data collection for the survey we need to either increase the navtiming oversampling altogether for ruwiki or consider asking another large wiki's community about running the survey.