This task tracks porting statsd metrics traffic to Prometheus. Specifically either porting the applications to use native Prometheus metrics or deploying `statsd_exporter` to expose Prometheus metrics derived from statsd traffic. The latter approach has been tested successfully for Thumbor in {T145867}.
This is an audit on statsd traffic received on `graphite1001` for ten minutes, sorted by "top level". There's some garbage/invalid names too in the list, to be ignored.
```
# cat top_users_10m | grep -v PagePrev
202906054 MediaWiki
62807151 restbase
24205989 varnish
13457335 changeprop
5280308 ores
4256639 aqs
3800019 cpjobqueue
1961745 eventbus
1203080 kafka
905410 mobileapps
800150 varnishkafka
771824 parsoid
560644 tilerator
461879 frontend
248861 logstash
145168 kartotherian
48707 parsoid-tests
32820 cxserver
27991 graphoid
25956 eventstreams
19591 citoid
18083 service_checker
15730 recommendation-api
13784 restbase-dev
11832 mathoid
9362 zuul
8419 proton
5713 ve
4314 mw
1526 media
447
357 tileratorui
348 swift
305 eventlogging
236 wikibase
220 browsertime
199 webpagetest
145 gerrit
40 varnis
37 servers
34 varn
19 varni
17 var
7 }:1|c
7 this
7 return this;
5 performance
3 v
```
Generated with
```
timeout 10m ngrep -q -W byline . udp dst port 8125 | grep -v -e '^U ' -e '^$' | cut -f1,2,3 -d. | pigz -9c > statsd_users_10m.gz
zcat statsd_users_10m.gz | cut -d. -f1 | sort | uniq -dc | sort -rn > top_users_10m
```
Annotated list of producers above, with plan of action
== moving to k8s?
== can be ignored, legacy and/or to be deprecated
[] kafka (only `analytics-eqiad` stats left here, mediawiki is its only client now https://phabricator.wikimedia.org/T152015)
[] varnish (to be deprecated https://phabricator.wikimedia.org/T184942)
[] varnishkafka (to be deprecated https://phabricator.wikimedia.org/T196066)
== will move to Prometheus anyway (?)
[] frontend (generated by navtiming, see also https://phabricator.wikimedia.org/T175087)
[] mw (generated by navtiming and statsv, see also https://phabricator.wikimedia.org/T175087)
[] performance (generated by navtiming, see also https://phabricator.wikimedia.org/T175087)
== TODO
[] gerrit - Emitted by Zuul service. Example usage: https://grafana.wikimedia.org/dashboard/db/releng-gerrit
[] logstash
[] ores
[] service_checker
[] swift
[x] thumbor
[] zuul - Example: https://grafana.wikimedia.org/dashboard/db/zuul , bottom of https://integration.wikimedia.org/zuul/
== use global aggregation / percentiles
[] MediaWiki (some metrics come from statsv (e.g. `MediaWiki.wikibase`)
[] aqs
[] browsertime (generated from where?)
[] changeprop
[] citoid
[] cpjobqueue
[] cxserver
[] eventbus
[] eventlogging
[] eventstreams
[] graphoid
[] kartotherian
[] mathoid
[] media (via statsv)
[] mobileapps
[] parsoid
[] parsoid-tests
[] proton
[] recommendation-api
[] restbase
[] restbase-dev
[] servers (used for `servers.labnet1001.nova`, otherwise to be deprecated when Diamond is decom)
[] tilerator
[] tileratorui
[] ve (via statsv)
[] webpagetest (via statsv)
[] wikibase (via statsv)