Page MenuHomePhabricator

Investigate spikes in threshold lookup requests in ORES-ext
Closed, ResolvedPublic

Description

https://grafana.wikimedia.org/d/000000263/ores-extension?orgId=1&from=now-2d&to=now

[13:51:24] <icinga-wm> PROBLEM - ores-extension grafana alert on icinga2001 is CRITICAL: CRITICAL: ORES extension ( https://grafana.wikimedia.org/d/000000263/ores-extension ) is alerting: Service hits for obtaining thresholds alert. https://wikitech.wikimedia.org/wiki/ORES
[13:53:31] <icinga-wm> RECOVERY - ores-extension grafana alert on icinga2001 is OK: OK: ORES extension ( https://grafana.wikimedia.org/d/000000263/ores-extension ) is not alerting. https://wikitech.wikimedia.org/wiki/ORES

Event Timeline

Halfak triaged this task as High priority.Mar 19 2019, 9:25 PM
Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.
Halfak claimed this task.

It looks like we get a spike when we do a deployment and it corresponds to how many models were updated. So no big concern here.