During the past months, dbproxy1005 has been detecting db1009 as down -normally that means not being able to connect in 3 seconds 3 times in a row. We believe this could because of temporary overload by some application (99% possibilities it would be one owned by cloud).
Last time this happened was Fri 23 Feb 2018:
[20:38:35] <icinga-wm> PROBLEM - haproxy failover on dbproxy1005 is CRITICAL: CRITICAL check_failover servers up 2 down 1
but it has happened several times in the past, I think more frequently lately.
dbproxy1005 is not the problem here, it is not used yet to failover m5, but we would like to do it eventually; the problem is that it could be detecting micro-downtimes caused by overload. Being a misbehaving cloud application is not verified, but the working thesis right now.
In addition to this events, after the latest grant changes, the number of aborted connections has increased greatly:
Please help us researching what was the long running and the new issue with mysql connections. This would be a blocker for T188029