Page MenuHomePhabricator

Upgrade prometheus-statsd-exporter
Closed, ResolvedPublic


We need a new version of prometheus-statsd-exporter that supports checking its configuration, so that we can use validate_cmd from puppet and automatically reload its config. See also T302372 for the problem that sprung this task

Event Timeline

Fresh off the presses I think we can upgrade to statsd_exporter 0.25, which would allow us to drop our custom patch to relay statsd metrics

Full changelog since 0.9 (what we're running in production)

## 0.25.0 / 2023-10-23

* [CHANGE] Update `client_golang` ([#508](, [#513](
* [ENHANCEMENT] Process UDP packets asynchronously ([#511](
* [BUGFIX] Debug-log incoming lines in cleartext ([#510](
* [SECURITY] Update `` ([#516](

This release is less likely to drop UDP packets under very high traffic.
Additionally, when it does, it now attempts to record that this happened in the metric `statsd_exporter_udp_packet_drops_total`, where previously this could only be detected from operating system metrics.
If you are already monitoring for OS-level UDP packet drops, you _must_ also monitor this metric.
The exporter will pull packets from the UDP socket queue much more quickly and queue them internally before processing.
Existing monitoring for packet drops will no longer be sufficient to detect dropped events, but attribution to the exporter is easier with this new metric.

Many thanks to @sumeshpremraj and @kullanici0606 for their contributions, and @pedro-stanaka for helping with the async UDP processing!

## 0.24.0 / 2023-06-02

* [FEATURE] Improve the landing page experience ([#504](
* [FEATURE] Support scaling parameter in mapping ([#499](

## 0.23.3 / 2023-06-02

* [SECURITY] Maintenance release, updating dependencies
* [ENHANCEMENT][library] Allow instantiating configuration without going through YAML ([#491](

Version 0.23.2 was mistagged and thus skipped.

## 0.23.1 / 2023-03-08

* [SECURITY] Update all dependencies ([#489](

## 0.23.0 / 2022-12-07

* [CHANGE] Print help and version to standard out ([#469](
* [FEATURE] Support experimental native histograms ([#474](

## 0.22.8 / 2022-09-13

* [BUGFIX] Prevent poisoning with gauge/distribution naming collision ([#461](
* [CHANGE] Update `client_golang` dependency ([#463](

## 0.22.7 / 2022-07-08

* [CHANGE] Build with Go 1.18 ([#450](

## 0.22.6 / 2022-07-08

* [CHANGE] Update dependencies ([#449](

This is another housekeeping release.

## 0.22.5 / 2022-05-06

* [ENHANCEMENT] Add metric for total lines relayed ([#434](

This release is built with Go 1.17.9, to address security issues in Go.

## 0.22.4 / 2021-11-26

* [BUGFIX] Make Docker image compatible with the runAsNonRoot setting in Kubernetes pods ([#409](
* [BUGFIX] Library: fix support for custom Registerers with histograms and summaries ([#410](

## 0.22.3 / 2021-10-26

* [BUGFIX] Accept metrics with multiple dashes even if not mapped ([#402](

## 0.22.2 / 2021-09-10

* [ENHANCEMENT] Add metrics to relay ([#393](

## 0.22.1 / 2021-09-01

* [ENHANCEMENT] Accept incoming metrics with multiple dashes (with mapping) ([#381](
* [ENHANCEMENT] Allow forwarding messages to statsd for easier transition ([#388](
* [BUGFIX] Actually expose pprof endpoints ([#386](
* [BUGFIX] Fix performance regression on metric ingestion ([#390](

## 0.21.0 / 2021-06-10

* [ENHANCEMENT] Update dependencies & switch to go-kit/log ([#379](

This release changes the log format to be more structured, in line with other Prometheus projects.

## 0.20.3 / 2021-06-04

* [ENHANCEMENT] Use extracted go-kit/log to reduce transitive dependencies ([#378](

Once again there is no functional change.
For library users, the dependency tree shrinks considerably.
See [prometheus/common#255]( for more details.

## 0.20.2 / 2021-05-03

* [BUGFIX] Remove copyleft licensed dependency ([#375](

There is no functional change for exporter users.
Removing this dependency reduces uncertainty for anyone reusing the mapping code.

## 0.20.1 / 2021-03-26

* [CHANGE] [library] Split mapper caches out from mapper ([#363](
* [BUGFIX] Accept metric segments that start with numbers ([#365](

## 0.20.0 / 2021-02-05

* [ENHANCEMENT] Support full defaults for summaries and histograms ([#361](

This completes support for `summary_options` and `histogram_options`.
Change the legacy configuration attributes throughout the mapping configuration as follows:

* `quantiles: …` to `summary_options: { quantiles: … }`
* `buckets: …` to `histogram_options: { buckets: … }`
* `timer_type` to `observer_type`.

Support for the deprecated attributes will be removed in a future release.

## 0.19.1 / 2021-01-29

* [BUGFIX] Don't return empty responses to lifecycle api requests ([#360](

## 0.19.0 / 2021-01-22

* [CHANGE] [library] Require explicit Registerer ([#347](
* [ENHANCEMENT] Add /-/healthy and /-/ready endpoints ([#339](
* [BUGFIX] Do not open network ports when only checking config ([#357](

## 0.18.0 / 2020-08-21

* [ENHANCEMENT] Allow turning off tagging extensions ([#325](
* [ENHANCEMENT] Add a lifecycle API for configuration reloads and restarts ([#329](

This release changes the interface for the [`` library package]( to support the new configurability.

## 0.17.0 / 2020-06-26

* [CHANGE] Support non-timer distributions without unit conversion ([#314](
* [ENHANCEMENT] Offline configuration check ([#312](
* [ENHANCEMENT] Support the SignalFX tagging extension ([#315](
* [BUGFIX] Allow matching single-letter metric name components ([#309](

Distribution and histogram events (type `d`, `h`) are now treated as distinct from timer events (type `ms`).
Their values are observed as they are, while timer events are converted from milliseconds to seconds.

To reflect this generalization, the `observer_type` mapping option replaces `timer_type`.
Similary, change `match_metric_type: timer` to `match_metric_type: observer`.
The old name remains available for compatibility.

For users of the mapper library, the `ObserverEvent` replaces `TimerEvent`.
For timer metrics, it is emitted by the mapper already converted to seconds.

## 0.16.0 / 2020-05-29

* [CHANGE] Break out much of the exporter into reusable packages ([#298](
* [ENHANCEMENT] Log ingested lines at debug level ([#305](

This release mainly consists of an internal reorganization of the exporter.
This should not have any impact on users of the binary, if it does, please file
an issue.

For users of the existing library packages, nothing changes.

There are now multiple new packages available, exposing functionality that had
been locked away in the main package. Consider the interfaces of these
libraries preliminary; we will change them as we gain experience in how they
are used.

## 0.15.0 / 2020-03-05

* [ENHANCEMENT] Allow setting granularity for summary metrics ([#290](
* [ENHANCEMENT] Support a random-replacement cache invalidation strategy ([#281](

To facilitate the expanded settings for summaries, the configuration format changes from

- match: …
  timer_type: summary
    - quantile: 0.99
      error: 0.001
    - quantile: 0.95
      error: 0.01


- match: …
  timer_type: summary
      - quantile: 0.99
        error: 0.001
      - quantile: 0.95
        error: 0.01
    max_summary_age: 30s
    summary_age_buckets: 3
    stream_buffer_size: 1000

For consistency, the format for histogram buckets also changes from

- match: …
  timer_type: histogram
  buckets: [ 0.01, 0.025, 0.05, 0.1 ]


- match: …
  timer_type: histogram
    buckets: [ 0.01, 0.025, 0.05, 0.1 ]

Transitionally, the old format will still work but is *deprecated*. The new
settings are optional.

For users of the [mapper](
as a library, this is a breaking change. To adjust your code, replace
`mapping.Buckets` with `mapping.HistogramOptions.Buckets` and
`mapping.Quantiles` with `mapping.SummaryOptions.Quantiles`.

## 0.14.1 / 2020-01-13

* [BUGFIX] Mapper cache poisoning when name is variable ([#286](
* [BUGFIX] nil pointer dereference in UDP listener ([#287](

Thank you to everyone who reported these, and @bakins for the mapper cache fix!

## 0.14.0 / 2020-01-10

* [CHANGE] Switch logging to go-kit ([#283](
* [CHANGE] Rename existing metric for mapping cache size ([#284](
* [ENHANCEMENT] Add metrics for mapping cache hits ([#280](

Logs are more structured now. The `fatal` log level no longer exists; use `--log.level=error` instead. The valid log formats are `logfmt` and `json`.

The metric `statsd_exporter_cache_length` is now called `statsd_metric_mapper_cache_length`.

## 0.13.0 / 2019-12-06

* [ENHANCEMENT] Support sampling factors for all statsd metric types ([#264](
* [ENHANCEMENT] Support Librato and InfluxDB labeling formats ([#267](

## 0.12.2 / 2019-07-25

* [BUGFIX] Fix Unix socket handler ([#252](
* [BUGFIX] Fix panic under high load ([#253](

Thank you to everyone who reported and helped debug these issues!

## 0.12.1 / 2019-07-08

* [BUGFIX] Renew TTL when a metric receives updates ([#246](
* [CHANGE] Reload on SIGHUP instead of watching the file ([#243](

## 0.11.2 / 2019-06-14

* [BUGFIX] Fix TCP handler ([#235](

## 0.11.1 / 2019-06-14

* [ENHANCEMENT] Batch event processing for improved ingestion performance ([#227](
* [ENHANCEMENT] Switch Prometheus client to promhttp, freeing the standard HTTP metrics ([#233](

With #233, the exporter no longer exports metrics about its own HTTP status. These were not helpful since you could not get them when scraping fails. This allows mapping to metric names like `http_requests_total` that are useful as application metrics.

## 0.10.6 / 2019-06-07

* [BUGFIX] Fix mapping collision for metrics with different types, but the same name ([#229](

## 0.10.5 / 2019-05-27

* [BUGFIX] Fix "Error: inconsistent label cardinality: expected 0 label values but got N in prometheus.Labels" ([#224](

## 0.10.4 / 2019-05-20

* [BUGFIX] Revert #218 due to a race condition ([#221](

## 0.10.3 / 2019-05-17

* [ENHANCEMENT] Reduce allocations when escaping metric names ([#217](
* [ENHANCEMENT] Reduce allocations when handling packets ([#218](
* [ENHANCEMENT] Optimize label sorting ([#219](

This release is entirely powered by @claytono. Kudos!

## 0.10.2 / 2019-05-17

* [CHANGE] Do not run as root in the Docker container by default ([#202](
* [FEATURE] Add metric for count of events by action ([#193](
* [FEATURE] Add metric for count of distinct metric names ([#200](
* [FEATURE] Add UNIX socket listener support ([#199](
* [FEATURE] Accept Datadog [distributions]( ([#211](
* [ENHANCEMENT] Add a health check to the Docker container ([#182](
* [ENHANCEMENT] Allow inconsistent label sets ([#194](
* [ENHANCEMENT] Speed up sanitization of metric names ([#197](
* [ENHANCEMENT] Enable pprof endpoints ([#205](
* [ENHANCEMENT] DogStatsD tag parsing is faster ([#210](
* [ENHANCEMENT] Cache mapped metrics ([#198](
* [BUGFIX] Fix panic if a mapping resulted in an empty name ([#192](
* [BUGFIX] Ensure that there are always default quantiles if using summaries ([#212](
* [BUGFIX] Prevent ingesting conflicting metric types that would make scraping fail ([#213](

With #192, the count of events rejected because of negative counter increments has moved into the `statsd_exporter_events_error_total` metric, instead of being lumped in with the different kinds of successful events.

Fresh off the presses I think we can upgrade to statsd_exporter 0.25, which would allow us to drop our custom patch to relay statsd metrics

v0.25 passes smoketest with MediaWiki StatsLib.

Fresh off the presses I think we can upgrade to statsd_exporter 0.25, which would allow us to drop our custom patch to relay statsd metrics

v0.25 passes smoketest with MediaWiki StatsLib.

Thank you for taking a look! I'll be investigating the production deployment/upgrade

lmata triaged this task as Medium priority.Oct 26 2023, 6:03 PM
lmata moved this task from Backlog to Prioritized on the Observability-Metrics board.
lmata moved this task from Inbox to Up next on the SRE Observability (FY2023/2024-Q2) board.

I have refreshed the Debian packaging and pushed a new packaging-wikimedia branch to the gerrit repo for prometheus-statsd-exporter; the resulting package is available at /var/cache/pbuilder/result/bookworm-amd64/prometheus-statsd-exporter_0.26.1-1_amd64.deb on build2001. Note that I've patched the source to also accept the --statsd.relay-address argument (as opposed to upstream's --statsd.relay.address) to ease upgrades. Once we have the new version rolled out we can change puppet to use upstream's flag

I tested the package above in pontoon and it works as expected, I'll be importing it into apt and rolling it out to production/baremetal early next week

Mentioned in SAL (#wikimedia-operations) [2024-06-10T08:54:23Z] <godog> upgrade prometheus-statsd-exporter on webperf - T302373

Mentioned in SAL (#wikimedia-operations) [2024-06-10T09:01:15Z] <godog> upload prometheus-statsd-exporter 0.26.1-1 to apt - T302373

Mentioned in SAL (#wikimedia-operations) [2024-06-10T09:37:08Z] <godog> roll upgrade prometheus-statsd-exporter to baremetal - T302373

fgiunchedi claimed this task.

Calling this one done, debian package is uploaded and container updated

Calling this one done, debian package is uploaded and container updated

Thank you!