Some metrics are collected on wdqs outside of diamond, and not deployed / configured by Puppet. New codfw nodes are missing those metrics. Lag and response time are used for Icinga alerting, at least those needs to be fixed. I'm not entirely sure where the script that collects those metrics is...
Description
Description
Details
Details
Related Changes in Gerrit:
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Gehel | T124627 Adjust balance of WDQS nodes to allow continued operation if eqiad went offline. | |||
| Resolved | Gehel | T124862 Deploy WDQS nodes on codfw | |||
| Resolved | Gehel | T144380 Install and configure new WDQS nodes on codfw | |||
| Resolved | Addshore | T146207 publish lag and response time for wdqs codfw to graphite | |||
| Resolved | akosiaris | T146474 Add firewall exception to get to wdqs*.codfw.wmnet:8888 from analytics cluster |
Event Timeline
Comment Actions
Change 312502 had a related patch set uploaded (by Addshore):
send lag and response time for wdqs codfw to graphite
Comment Actions
Change 312503 had a related patch set uploaded (by Addshore):
send lag and response time for wdqs codfw to graphite
Comment Actions
Change 312503 merged by jenkins-bot:
send lag and response time for wdqs codfw to graphite
Comment Actions
Change 312502 merged by jenkins-bot:
send lag and response time for wdqs codfw to graphite