This is an umbrella task to track what was discussed in the design documents, with the subsequent implementation plan:
- pt-heartbeat + scaffolding
- Create the prometheus http exporter scaffolding
- Implement the custom pt-heartbeat monitoring
- Create the related alert rule(s)
- seconds_behind_master + threads (replication/io)
- Add the show slave status; parsing
- Create the related prometheus-node-exporter alert rule(s)
- memory pressure
- *Implement custom memory monitoring if needed*
- Create the related alert rule(s)
- disk pressure
- *Implement custom disk monitoring if needed*
- Create the related alert prometheus-node-exporter rule(s)
- read only status
- *Implement custom query if needed*
- Create the related mysqld-exporter alert rule(s)
- process monitoring
- *Implement custom system probe if needed*
- Create the related systemd unit alert rule(s)
- mariadb errors
- decide upon feature parity
- Implement a proof of concept of error message passing
- Productionize the POC
- Create the related alert rule(s)
- decide upon feature parity
The end goal of that migration is to reach full feature parity with our current icinga monitoring situation