Page MenuHomePhabricator

nova-conductor running out of mysql connections
Closed, ResolvedPublic

Description

In responding to an alert from the nova-fullstack agent, I see that nova-conductor has been failing:

2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in connect
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server     return dialect.connect(*cargs, **cparams)
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in connect
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server     return self.dbapi.connect(*cargs, **cparams)
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/dist-packages/MySQLdb/__init__.py", line 81, in Connect
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server     return Connection(*args, **kwargs)
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 204, in __init__
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server     super(Connection, self).__init__(*args, **kwargs2)
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server OperationalError: (_mysql_exceptions.OperationalError) (1226, "User 'nova' has exceeded the 'max_user_connections' resource (current value: 100)")
2019-10-08 02:53:39.852 14301 ERROR oslo_messaging.rpc.server

This happened earlier today when we upgraded to Newton; we cleared all the connections but now it's run out again. We need to reduce the number of conductor workers, increase the number of allowed connections, or find a leak.

Event Timeline

Andrew triaged this task as High priority.Oct 8 2019, 3:02 AM

Mentioned in SAL (#wikimedia-operations) [2019-10-08T03:03:59Z] <andrewbogott> restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 — experimental band-aid for T234876

heh, the first forum post I found about this topic suggests raising the connection limit to 2000

Change 541407 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] nova: try to reduce the number of db connections

https://gerrit.wikimedia.org/r/541407

I'm merging an experimental patch to reduce the number of connections needed. It's possible that this issue was caused by Newton upgrade (and some changein behavior) but it could also be a result of us switching to an HA setup (if the connection limit on the db side is per user/database and not per host/user/database).

Change 541407 merged by Andrew Bogott:
[operations/puppet@production] nova: try to reduce the number of db connections

https://gerrit.wikimedia.org/r/541407

I checked nova-fullstack this morning. Everything looks good. No leaks so far.

Andrew claimed this task.