Machine-level metrics are covered in prometheus by node_exporter (tracked in T140646) though we also have application-specific metrics deployed in ganglia.
For prometheus to be a viable replacement for ganglia we'd have to have at least the same metrics (if not better) in prometheus too.
See also https://wikitech.wikimedia.org/wiki/Prometheus#Replacing_Ganglia for a list of ganglia plugins we are currently deploying. I'm listing below the ones I think are more important/urgent to have:
The list of rrds updated in the last 30d in P4571 and their current status.
- fundraising-related stats for misc queues and donations T152562
- cirrussearch slow log rate, in graphite via logstash
- apache mod_socache_shmcb stats, we don't seem to use mod_socache anyway
- elasticsearch stats, afaict those are in graphite already
- exim, can be done with diamond/graphite or in prometheus via node_exporter
- jenkins TODO? some stats might be already in graphite
- kafka, in graphite
- varnishkafka, in graphite
- osm sync lag from /srv/osmosis/state.txt
- powerdns, in graphite via diamond