Page MenuHomePhabricator

Alert on group1 canary wikis navtiming report rate
Closed, ResolvedPublic

Description

Issues like the 2 navtiming downtimes I caused this week could have been prevented if we would have been able to notice their effect on group1 wikis.

Taking a quick look, we don't collect enough navtiming data on those wikis and they have no data at night, but we can try oversampling them:

SELECT wiki, hour, COUNT(*) FROM event.navigationtiming WHERE year = 2019 AND month = 4 AND day = 11 AND wiki IN ('cawiki', 'hewiki') GROUP BY wiki, hour;

wiki	hour	_c2
cawiki	1	1
cawiki	7	1
cawiki	8	1
cawiki	11	1
cawiki	14	1
cawiki	16	1
cawiki	17	2
cawiki	19	1
hewiki	5	1
hewiki	7	1
hewiki	8	1
hewiki	9	3
hewiki	10	2
hewiki	12	1
hewiki	13	4
hewiki	14	3
hewiki	17	3
hewiki	19	2
hewiki	20	2
hewiki	21	1
hewiki	23	1

Event Timeline

Change 503317 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Oversample navtiming on cawiki and commonswiki

https://gerrit.wikimedia.org/r/503317

Gilles renamed this task from Alert on group1 wikis navtiming report rate to Alert on group1 canary wikis navtiming report rate.Apr 12 2019, 11:35 AM

Change 503317 merged by jenkins-bot:
[operations/mediawiki-config@master] Oversample navtiming on cawiki and commonswiki

https://gerrit.wikimedia.org/r/503317

Mentioned in SAL (#wikimedia-operations) [2019-04-12T11:44:08Z] <gilles@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220807 Oversample navtiming on cawiki and commonswiki (duration: 05m 14s)

Change 503322 had a related patch set uploaded (by Gilles; owner: Gilles):
[performance/navtiming@master] Record report rate per wiki for navtiming and painttiming

https://gerrit.wikimedia.org/r/503322

Change 503328 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Reduce cawiki survey sampling rate

https://gerrit.wikimedia.org/r/503328

Change 503328 merged by jenkins-bot:
[operations/mediawiki-config@master] Reduce cawiki survey sampling rate

https://gerrit.wikimedia.org/r/503328

Mentioned in SAL (#wikimedia-operations) [2019-04-12T12:16:03Z] <gilles@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220807 Reduce cawiki survey sampling rate (duration: 05m 11s)

Change 503322 merged by jenkins-bot:
[performance/navtiming@master] Record report rate per wiki for navtiming and painttiming

https://gerrit.wikimedia.org/r/503322

Navtiming processor change deployed and tested on beta, moving onto production deployment.

Updated production navtiming processor.

The data is coming into graphite for the groups, I'll make the alert on Monday once we have a couple of days of data to show us the lowest rate under normal conditions.