This task tracks porting statsd metrics traffic to Prometheus. Specifically either porting the applications to use native Prometheus metrics or deploying `statsd_exporter` to expose Prometheus metrics derived from statsd traffic. The latter approach has been tested successfully for Thumbor in {T145867}.
This is an audit on statsd traffic received on graphite host, sorted by "top level". There's some garbage/invalid names too in the list, to be ignored.
{P8742}
Generated with
```
timeout 10m ngrep -q -W byline . udp dst port 8125 | grep -v -e '^U ' -e '^$' | cut -f1,2,3 -d. | pigz -9c > statsd_users_10m.gz
zcat statsd_users_10m.gz | cut -d. -f1 | sort | uniq -dc | sort -rn > top_users_10m
```
Annotated list of producers above, with plan of action:
== Moving to k8s?== statsv-produced metrics, see also T175087
[] mathoid
== Can be ignored, legacy and/or to be deprecatedmw.js.deprecate (generated client-side from mediawiki/extensions/WikimediaEvents)
[] kafka (only `analytics-eqiad` stats left here, mediawiki is its only client now T152015)mw.performance
[] varnish (to be deprecated T184942browsertime (from WebPageReplay)
[] varnishkafka (to be deprecated T196066)
== Will move to Prometheus anyway (?)ve
[] frontend (generated by navtiming,Some metrics in MediaWiki hierarchy, e.g. see also T175087)minerva.WebClientError
[] mw.performance (generated by navtimingpagepreviews (top level, PagePreviewsApiFailure/ PagePreviewsApiResponse/ PagePreviewsPreviewShow/)
[] media.thumbnail.client
[] webpagetest (generated by wpt-reporter from Jenkins)
[] wikibase.queryService.ui
== navtiming-produced metrics, see also T175087)
[] frontend
[] mw.performance.save*
[] eventlogging.client_errors.navigation/paitingtiming
[] performance.survey
== TODO
[] gerrit - Emitted by Zuul service. Example usage: https://grafana.wikimedia.org/dashboard/db/releng-gerrit
[x] logstash
[x] ores
[] service_checker
[x] swift
[x] thumbor
[] zuul - Example: https://grafana.wikimedia.org/dashboard/db/zuul , bottom of https://integration.wikimedia.org/zuul/
[] mw.js.deprecate (generated client-side via statsv from mediawiki/extensions/WikimediaEventscloudvps - from `nova_fullstack_test.py` (see also T210850 for more context)
== Use global aggregation / percentiles
See also https://wikitech.wikimedia.org/wiki/Prometheus/statsd_k8s for an introduction for service owners on how to write their statsd_exporter mappings (in k8s, but guidelines are generic)
[] MediaWiki (some metrics come from statsv (e.g. `MediaWiki.wikibase`)
[] aqs
[] browsertime (via statsv; generated by WebPageReplay)
[] changeprop
[] citoid
[] cpjobqueue
[] cxserver
[] eventbus
[] eventlogging
[] eventstreams
[] graphoid
[] kartotherian
[] media (via statsv)
[] mobileapps
[] parsoid
[] parsoid-tests
[] proton
[] recommendation-api
[] restbase
[] restbase-dev
[] servers (used for `servers.labnet1001.nova`, otherwise to be deprecated when Diamond is decom)
[] tilerator
[] tileratorui
[] ve (via statsv)
[] webpagetest (via statsv; generated by wpt-reporter from Jenkins)
[] wikibase (via statsv)