Page MenuHomePhabricator

Let us customize Zuul metrics reported to statsd
Closed, DeclinedPublic

Description

Our statsd server is being filled with metrics that are barely used ( T1075: Audit groups of metrics in Graphite that allocate a lot of disk space ) one of them originate from Zuul.

Filippo crafted a patch to alter the statsd metric emitted by Zuul scheduler. It merges the jobs timing together: https://gerrit.wikimedia.org/r/#/c/174691/1/zuul/scheduler.py

I would like to further enhance that patch to let one enable/disable and customize the metrics being emitted. Would need to work with upstream on https://review.openstack.org/ repo openstack-infra/zuul.git

There is only a few calls that needs to be investigated:

$ git grep --perl-regexp --show-function   'statsd\.\w+\(' zuul/
zuul/scheduler.py=    def addEvent(self, event):
zuul/scheduler.py:                statsd.incr('gerrit.event.%s' % event.type)
zuul/scheduler.py=    def onBuildCompleted(self, build):
zuul/scheduler.py:                    statsd.timing(key, dt)
zuul/scheduler.py:                statsd.incr(key)
zuul/scheduler.py:                statsd.incr(key)
zuul/scheduler.py=    def _doReconfigureEvent(self, event):
zuul/scheduler.py:                        statsd.gauge(key + '.current_changes', items)
zuul/scheduler.py=    def reportStats(self, item):
zuul/scheduler.py:            statsd.gauge(key + '.current_changes', items)
zuul/scheduler.py:                statsd.timing(key + '.resident_time', dt)
zuul/scheduler.py:                statsd.incr(key + '.total_changes')
zuul/scheduler.py:                statsd.timing(key + '.resident_time', dt)
zuul/scheduler.py:                statsd.incr(key + '.total_changes')
$

I thought about having the keys defined in zuul.conf something like:

[statsd]
gerrit.event = 'gerrit.event.{event_type}'
zuul.buildcomplete.timing = 'zuul.pipeline.{pipeline_name}.job.{jobname}.{build_result}
zuul.buildcomplete.count = 'zuul.pipeline.{pipeline_name}.job.{jobname}.{build_result}
zuul.buildcomplete.alljobs.count = 'zuul.pipeline.{pipeline_name}.all_jobs

We can then retrieve them from the config file and use string formatting such as:

key = self.config.get( 'statsd', 'zuul.buildcomplete.count' )
if key is str:
    statsd.incr( key.format( {
        'pipeline_name': build.pipeline.name,
        'jobname': jobname,
        'build_result': build.result,
    })

This way we can even disable a metric by setting it to False.

Event Timeline

hashar claimed this task.
hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added projects: Grafana, acl*sre-team.
hashar changed Security from none to None.
hashar added subscribers: fgiunchedi, Joe, hashar.
hashar triaged this task as Medium priority.Nov 24 2014, 10:40 AM
hashar moved this task from INBOX to Backlog (ARCHIVED) on the Release-Engineering-Team board.
hashar moved this task from Untriaged to Ready on the Continuous-Integration-Infrastructure board.

I have no spare cycles to implement the feature in Zuul. That is straight python, should not be too hard for anyone to realize it.

hashar moved this task from Ready to Untriaged on the Continuous-Integration-Infrastructure board.

Seems statsd is strong enough to handle the metrics. Notably nowadays we have ~ 300 jobs instead of thousands so there are way less metrics.