This task tracks porting statsd metrics traffic to Prometheus. Specifically either porting the applications to use native Prometheus metrics or deploying `statsd_exporter` to expose Prometheus metrics derived from statsd traffic. The latter approach has been tested successfully for Thumbor in {T145867}.
This is an audit on statsd traffic received on graphite host, sorted by "top level". There's some garbage/invalid names too in the list, to be ignored.
{P8742}
Generated with
```
timeout 10m ngrep -q -W byline . udp dst port 8125 | grep -v -e '^U ' -e '^$' | cut -f1,2,3 -d. | pigz -9c > statsd_users_10m.gz
zcat statsd_users_10m.gz | cut -d. -f1 | sort | uniq -dc | sort -rn > top_users_10m
```
Annotated list of producers above, with plan of action:
== statsv-produced metrics, see also T180105
[] mw.js.deprecate (generated client-side from mediawiki/extensions/WikimediaEvents)
[] mw.performance
[] browsertime (from WebPageReplay)
[] ve
[] Some metrics in MediaWiki hierarchy, e.g. minerva.WebClientError
[] pagepreviews (top level, PagePreviewsApiFailure/ PagePreviewsApiResponse/ PagePreviewsPreviewShow/)
[] media.thumbnail.client
[] webpagetest (generated by wpt-reporter from Jenkins)
[] wikibase.queryService.ui
== navtiming-produced metrics, see also T175087
[] frontend
[] mw.performance.save*
[] eventlogging.client_errors.navigation/paitingtiming
[] performance.survey
== TODO
[] gerrit - Emitted by Zuul service. Example usage: https://grafana.wikimedia.org/dashboard/db/releng-gerrit
[x] logstash
[x] ores
[] service_checker - the idea is to move to a blackbox_exporter-like model, see also https://gerrit.wikimedia.org/r/c/operations/software/service-checker/+/532807 and https://github.com/shdubsh/prometheus-swagger-exporter
[x] swift
[x] thumbor
[] zuul - Example: https://grafana.wikimedia.org/dashboard/db/zuul , bottom of https://integration.wikimedia.org/zuul/
[] cloudvps - from `nova_fullstack_test.py` (see also T210850 for more context)
== Use global aggregation / percentiles
See also https://wikitech.wikimedia.org/wiki/Prometheus/statsd_k8s for an introduction for service owners on how to write their statsd_exporter mappings (in k8s, but guidelines are generic)
[] MediaWiki (some metrics come from statsv (e.g. `MediaWiki.wikibase`)
[] aqs
[] changeprop
[] cpjobqueue
[] eventbus
[] eventlogging
[] eventstreams
[] graphoid
[] kartotherian
[] mobileapps
[] parsoid
[] parsoid-tests
[] proton
[] recommendation-api
[] restbase
[] restbase-dev
[] tilerator
[] tileratorui