Missed to setup statsd env variable in Nodepool to have it send its metrics to Graphite.
|operations/puppet : production||nodepool: send metrics to statsd|
|Resolved||chasemp||T111496 Nodepool should send metrics to statsd|
|Declined||None||T111503 Document Nodepool statsd metrics|
|Declined||None||T111504 Teach Nodepool to not send statsd metrics per jobs|
Per discussion on https://gerrit.wikimedia.org/r/#/c/235989/ Nodepool sends too many metrics which is going to overload our Statsd server eventually.
Suggested by Filippo, I looked at Nodepool and on a job success it reports 8 metrics. So the more jobs we run the fastest we will exhaust our Graphite server.
An example for a given job 'npm':
nodepool.job.npm.runtime (timing) nodepool.job.npm.builds (count) nodepool.job.npm.<branch>.runtime (timing) nodepool.job.npm.<branch>.builds (count) nodepool.job.npm.<branch>.<jenkins_label>.runtime (timing) nodepool.job.npm.<branch>.<jenkins_label>.builds (count) nodepool.job.npm.<branch>.<jenkins_label>.<name>.runtime (timing) nodepool.job.npm.<branch>.<jenkins_label>.<cloud name>.builds (count)
Their metrics can be further browsed at http://graphite.openstack.org/
It has a whole lot more of metrics that are undocumented. I guess I will doc them to upstream and propose a patch to disable per jobs reporting.