We're using Diamond to collect all sorts of metrics from both services and the
"machine" (the kernel) itself, we'll need to port some/all of these collectors
to be collected by Prometheus.
There's a list of diamond collectors in use at
https://wikitech.wikimedia.org/wiki/Prometheus#Diamond
The list is reported below too, splitted by macro categories
Implementation via cron and textfile for node-exporter to pick up
- minimalpuppetagent.py / Report puppet stats from last_run_summary.yaml
- Substituted by prometheus-puppet-agent-stats
- localcrontab.py / Report the number of users' crontabs, mainly used in tools
- cherry-pick-counter-collector.py / Report the number of cherry-pick patches in a given git repo
- nagios.py / Execute nagios commands locally and report the exit code
- sshsessions.py / Collect number of lines from who
- dir_size_tracker.py / Collect the size of given directories
- sge.py / Collect metrics from gridengine
Implementation via a separate exporter
- nginx.py / Collect nginx basic metrics from nginx's status page
- Substituted by nginx-lua-stats
- memcached / See memcached in ganglia above
- Substituted by memcache exporter
- pybal_state.py / Parse PyBal's pools info from localhost:9090
- Subsitituted by native pybal instrumentation
- rcstream diamond_collector.py / Parse RCStream stats from localhost:10080
- rcstream deprecated
- varnishstatus.py / Collect varnish stats from varnishtop, used in beta only ?
- Substituted by varnish exporter
- hhvm_apc.py / Parse localhost:9002/dump-apc-info and report stats
- Need to extend prometheus-hhvm-exporter to include apc info
- Implementation at https://gerrit.wikimedia.org/r/#/c/382728/
- rabbitmq.py / Collect rabbitmq queue stats, for openstack
- redisstat.py / Collect redis stats from multiple instances
- nutcracker.py / Parse json from nutcracker stats
- openldap.py / Parse openldap metrics from local ldap server
- powerdns.py / powerdns_recursor.py / Parse metrics from rec_control
- postgresql
- blazegraph.py / Parse XML from localhost:9999
- Replace with jmx_exporter ?
- wdqs_updater.py / Collect jmx stats exported by jolokia at http://localhost:8778
- Replace with jmx_exporter ?
- wmfelastic.py / Paired down collector for elasticsearch, exports basic stats and not per-index
- Replace with jmx_exporter, or https://github.com/justwatchcom/elasticsearch_exporter
- T181627: Port elasticsearch metrics to Prometheus
- ircd_stats.py / Custom exporter
Misc
- extendedexim.py / Parse exim's paniclog and queue stats by calling exim -bpr
- Implementation by parsing logs via mtail
- T179565: Port exim statistics to Prometheus
- etherpad.py / Parse localhost:9001/stats and report stats
- Might make sense to contribute an etherpad plugin or patch for prometheus stats?
- nfsd.py / Parse and report stats from /proc/net/rpc/nfsd and /proc/fs/nfsd/pool_stats
- node_exporter supports nfsd stats as of https://github.com/prometheus/node_exporter/pull/803/files, needs double checking to make sure all interesting metrics we want are present
- nfsiostat.py / Emulate iostat for NFS mount points using /proc/self/mountstats
- Supported by node_exporter (nfs and mountstats collectors)
- Metrics with both collectors enabled on one of tools-worker at https://phabricator.wikimedia.org/P6090 to compare with what we have now
- nf_conntrack_counter.py / Report sysctl net.netfilter.nf_conntrack_count
- Supported by node_exporter, implementation at https://gerrit.wikimedia.org/r/382695