Our statsd server is being filled with metrics that are barely used ( T1075: Audit groups of metrics in Graphite that allocate a lot of disk space ) one of them originate from Zuul.
Filippo crafted a patch to alter the statsd metric emitted by Zuul scheduler. It merges the jobs timing together: https://gerrit.wikimedia.org/r/#/c/174691/1/zuul/scheduler.py
I would like to further enhance that patch to let one enable/disable and customize the metrics being emitted. Would need to work with upstream on https://review.openstack.org/ repo openstack-infra/zuul.git
There is only a few calls that needs to be investigated:
$ git grep --perl-regexp --show-function 'statsd\.\w+\(' zuul/ zuul/scheduler.py= def addEvent(self, event): zuul/scheduler.py: statsd.incr('gerrit.event.%s' % event.type) zuul/scheduler.py= def onBuildCompleted(self, build): zuul/scheduler.py: statsd.timing(key, dt) zuul/scheduler.py: statsd.incr(key) zuul/scheduler.py: statsd.incr(key) zuul/scheduler.py= def _doReconfigureEvent(self, event): zuul/scheduler.py: statsd.gauge(key + '.current_changes', items) zuul/scheduler.py= def reportStats(self, item): zuul/scheduler.py: statsd.gauge(key + '.current_changes', items) zuul/scheduler.py: statsd.timing(key + '.resident_time', dt) zuul/scheduler.py: statsd.incr(key + '.total_changes') zuul/scheduler.py: statsd.timing(key + '.resident_time', dt) zuul/scheduler.py: statsd.incr(key + '.total_changes') $
I thought about having the keys defined in zuul.conf something like:
[statsd] gerrit.event = 'gerrit.event.{event_type}' zuul.buildcomplete.timing = 'zuul.pipeline.{pipeline_name}.job.{jobname}.{build_result} zuul.buildcomplete.count = 'zuul.pipeline.{pipeline_name}.job.{jobname}.{build_result} zuul.buildcomplete.alljobs.count = 'zuul.pipeline.{pipeline_name}.all_jobs
We can then retrieve them from the config file and use string formatting such as:
key = self.config.get( 'statsd', 'zuul.buildcomplete.count' ) if key is str: statsd.incr( key.format( { 'pipeline_name': build.pipeline.name, 'jobname': jobname, 'build_result': build.result, })
This way we can even disable a metric by setting it to False.