Data collection for tools job_count seems to be broken
Closed, ResolvedPublic

Description

https://graphite-labs.wikimedia.org/render/?width=600&height=300&target=cactiStyle(sumSeries(tools.tools-services-01.sge.hosts.tools*.job_count))&from=-1d

Sometime around 2016-10-31T15:00 the collected job counts for the tools.tools-services-01.sge.hosts.tools*.job_count graphite metrics dropped from ~800 to ~150.

Some early investigation seems to point to a failure in the collector related to the removal of the defunct tools-exec-cyberbot job runner.

bd808 created this task.Oct 31 2016, 9:56 PM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptOct 31 2016, 9:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
bd808 updated the task description. (Show Details)Oct 31 2016, 9:57 PM

let's see if this works

root@tools-bastion-03:~# qconf -de tools-exec-cyberbot.eqiad.wmflabs
root@tools-bastion-03.tools.eqiad.wmflabs removed "tools-exec-cyberbot.eqiad.wmflabs" from execution host list

theory is this was deleted but still returns by /usr/bin/qconf -sel

bd808 closed this task as Resolved.Nov 1 2016, 12:00 AM
bd808 assigned this task to chasemp.