I'm seeing flaky behavior in openstack -- some fullstack failures, some random horizon errors.
In logstash I see quite a few database issues: 'Deadlock: wsrep aborted transaction' and also some 'too many connections' complaints.
I'm seeing flaky behavior in openstack -- some fullstack failures, some random horizon errors.
In logstash I see quite a few database issues: 'Deadlock: wsrep aborted transaction' and also some 'too many connections' complaints.
Change 704605 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Galera: increase number of allowed connections
Here's my theory:
There might be some HA proxy tuning we can do to prevent this.
Mentioned in SAL (#wikimedia-cloud) [2021-07-14T21:08:44Z] <andrewbogott> restarting lots of openstack services while trying to resolve T286675
In haproxy I see:
Server mysql/cloudcontrol1005.wikimedia.org is DOWN, reason: Socket error, info: "Connection reset by peer", check duration: 40ms. 0 active and 2 backup servers left. Running on backup. 800 sessions active, 0 requeued, 0 remaining in queue.
so going to merge the attached patch in case that's the cause of the flap.
Change 704605 merged by Andrew Bogott:
[operations/puppet@production] Galera: increase number of allowed connections
Change 704638 had a related patch set uploaded (by Bstorm; author: Bstorm):
[operations/puppet@production] cloud galera: have haproxy shut down sessions when marked
Change 704846 had a related patch set uploaded (by Bstorm; author: Bstorm):
[operations/puppet@production] openstack galera: set monitor on failover
Change 704846 merged by Bstorm:
[operations/puppet@production] openstack galera: set monitor on failover
Change 704638 merged by Bstorm:
[operations/puppet@production] cloud galera: have haproxy shut down sessions when marked
Change 705507 had a related patch set uploaded (by Bstorm; author: Bstorm):
[operations/puppet@production] cloud galera: have haproxy shut down sessions when marked
Change 705507 merged by Bstorm:
[operations/puppet@production] cloud galera: have haproxy shut down sessions when marked