Page MenuHomePhabricator

Data collection for tools job_count seems to be broken
Closed, ResolvedPublic

Description

https://graphite-labs.wikimedia.org/render/?width=600&height=300&target=cactiStyle(sumSeries(tools.tools-services-01.sge.hosts.tools*.job_count))&from=-1d

Sometime around 2016-10-31T15:00 the collected job counts for the tools.tools-services-01.sge.hosts.tools*.job_count graphite metrics dropped from ~800 to ~150.

Some early investigation seems to point to a failure in the collector related to the removal of the defunct tools-exec-cyberbot job runner.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

let's see if this works

root@tools-bastion-03:~# qconf -de tools-exec-cyberbot.eqiad.wmflabs
root@tools-bastion-03.tools.eqiad.wmflabs removed "tools-exec-cyberbot.eqiad.wmflabs" from execution host list

theory is this was deleted but still returns by /usr/bin/qconf -sel