This task is about mysql monitoring about metrics gathering (not alerts).
We've deployed prometheus-mysqld-exporter to put mysql metrics into prometheus in T126757: test prometheus mysqld-exporter. Most functionality provided by tendril is already there, though some things are missing:
- multi-source replication metrics
- support for multi-instance nodes
- latency monitoring for a sample query or queries
- Tables of data (e.g. summary of hosts and basic properties) is difficult to handle on grafana, may need a separate technology (e.g. web dashboard, orchestrator, ...)
- fetch replication metrics from pt-hearthbeat (tracked in T141968: Display lag on grafana (prometheus) from pt-heartbeat instead (or in addition) of Seconds_Behind_Master)
- performance_schema upstream monitoring is unflexible, and it may require a separate prometheus instance for privacy reasons
- table properties such as size, number of rows is disabled, and it may require a separate prometheus instance for privacy reasons
- prometheus configuration for mysql host/shard/role/etc is manual, should be automated via puppetdb. Also note that "role" (master/slave) should be exported by the machine itself not in the prometheus configuration, this makes things easier when changing master/slave. Shard OTOH doesn't change for the lifetime of the machine (?) and can be put in the prometheus configuration