This task is about mysql monitoring about metrics gathering (not alerts).
We've deployed prometheus-mysqld-exporter to put mysql metrics into prometheus in T126757: test prometheus mysqld-exporter. Most functionality provided by tendril is already there, though some things are missing:
- multi-source replication metrics
- support for multi-instance nodes
- audit-type functionality to list in a table e.g. (mysql_version, lag, qps) currently it isn't possible in grafana to have tables merging multiple metrics
- fetch replication metrics from pt-hearthbeat (tracked in T141968: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master)
- performance_schema upstream monitoring is unflexible, and it may require a separate prometheus instance for privacy reasons
- prometheus configuration for mysql host/shard/role/etc is manual, should be automated via puppetdb. Also note that "role" (master/slave) should be exported by the machine itself not in the prometheus configuration, this makes things easier when changing master/slave. Shard OTOH doesn't change for the lifetime of the machine (?) and can be put in the prometheus configuration