We're using Diamond to collect all sorts of metrics from both services and the
"machine" (the kernel) itself, we'll need to port some/all of these collectors
to be collected by Prometheus.
There's a list of diamond collectors in use at
https://wikitech.wikimedia.org/wiki/Prometheus#Diamond
The list is reported below too, splitted by macro categories
= Implementation via cron and `textfile` for node-exporter to pick up
[x] minimalpuppetagent.py / Report puppet stats from last_run_summary.yaml
** Substituted by prometheus-puppet-agent-stats
[ ] localcrontab.py / Report the number of users' crontabs, mainly used in tools
[ ] cherry-pick-counter-collector.py / Report the number of cherry-pick patches in a given git repo
[ ] nagios.py / Execute nagios commands locally and report the exit code
[ ] sshsessions.py / Collect number of lines from `who`
[ ] dir_size_tracker.py / Collect the size of given directories
[ ] ircd_stats.py / Parse MOTD from local irc server
[ ] sge.py / Collect metrics from gridengine
= Implementation via a separate exporter
[x] nginx.py / Collect nginx basic metrics from nginx's status page
** Substituted by nginx-lua-stats
[x] memcached / See memcached in ganglia above
** Substituted by memcache exporter
[x] pybal_state.py / Parse PyBal's pools info from localhost:9090
** Subsitituted by native pybal instrumentation
[x] rcstream diamond_collector.py / Parse RCStream stats from localhost:10080
** rcstream deprecated
[x] varnishstatus.py / Collect varnish stats from varnishtop, used in beta only ?
** Substituted by varnish exporter
[ ] hhvm_apc.py / Parse localhost:9002/dump-apc-info and report stats
** Need to extend `prometheus-hhvm-exporter` to include apc info
** Implementation at https://gerrit.wikimedia.org/r/#/c/382728/
[ ] rabbitmq.py / Collect rabbitmq queue stats, for openstack
** https://github.com/kbudde/rabbitmq_exporter or https://github.com/deadtrickster/prometheus_rabbitmq_exporter
[ ] redisstat.py / Collect redis stats from multiple instances
** https://github.com/oliver006/redis_exporter
[ ] nutcracker.py / Parse json from nutcracker stats
** https://github.com/xavierholt/twemproxy_exporter https://github.com/albert-widi/twemproxy_exporter https://github.com/bengler/twemproxy_exporter
** Upstream issue https://github.com/twitter/twemproxy/issues/540
[ ] openldap.py / Parse openldap metrics from local ldap server
** https://github.com/jcollie/openldap_exporter
[ ] powerdns.py / powerdns_recursor.py / Parse metrics from rec_control
** https://github.com/janeczku/powerdns_exporter https://github.com/wrouesnel/pdns_exporter
** Related upstream issue https://github.com/PowerDNS/pdns/issues/4947
[ ] postgresql
** https://github.com/wrouesnel/postgres_exporter
[ ] blazegraph.py / Parse XML from localhost:9999
** Replace with jmx_exporter ?
[ ] wdqs_updater.py / Collect jmx stats exported by jolokia at http://localhost:8778
** Replace with jmx_exporter ?
[ ] wmfelastic.py / Paired down collector for elasticsearch, exports basic stats and not per-index
** Replace with jmx_exporter ?
= Misc
[ ] extendedexim.py / Parse exim's paniclog and queue stats by calling exim -bpr
** Implementation by parsing logs via mtail
[ ] etherpad.py / Parse localhost:9001/stats and report stats
** Might make sense to contribute an etherpad plugin or patch for prometheus stats?
[ ] nfsd.py / Parse and report stats from /proc/net/rpc/nfsd and /proc/fs/nfsd/pool_stats
** node_exporter doesn't support this yet, sending a patch upstream would be the right thing to do (upstream issue https://github.com/prometheus/node_exporter/issues/607)
[ ] nfsiostat.py / Emulate iostat for NFS mount points using /proc/self/mountstats
** Supported by node_exporter (`nfs` and `mountstats` collectors)
** Metrics with both collectors enabled on one of `tools-worker` at https://phabricator.wikimedia.org/P6090 to compare with what we have now
[ ] nf_conntrack_counter.py / Report sysctl net.netfilter.nf_conntrack_count
** Supported by node_exporter, implementation at https://gerrit.wikimedia.org/r/382695