Via the new ganglia::diskstat.
Disk performance is obviously important for database (isn't it?) and I remember Tim saying he missed a check he had added a while ago.
Version: unspecified
Severity: enhancement
Via the new ganglia::diskstat.
Disk performance is obviously important for database (isn't it?) and I remember Tim saying he missed a check he had added a while ago.
Version: unspecified
Severity: enhancement
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T172492 Database alerting | |||
Resolved | jcrespo | T57406 Monitor database hosts' disk performance | |||
Resolved | hashar | T38994 [OPS] Add disk I/O to ganglia reports |
While there is no disk-specific alerts (other than RAID health checks, which probably should be enough) there is almost complete metrics gathered of io stats both on the MySQL grafana dashboards: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1 (disk latency, throughput in bytes and iops) and on the host stats: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=db1083&var-datasource=thanos&var-cluster=mysql
It is true that the host disk stats are not great (disk utilization is not very useful for real-world problems), but that is just a presentation issue that I am not pushing to improve.
Given the vagueness of the ticket, I would consider this as "Done" when prometheus monitoring was implemented.