I'm getting database errors in some tools that until some days ago was running without errors. The error returned is "oursql.OperationalError: (2013, 'Lost connection to MySQL server during query', None)". It seems to happen when query delays more than ~4 sec, faster queries run without errors.
Description
Description
Related Objects
Related Objects
- Mentioned In
- T180380: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers
T122658: MySQL connections die in less than 30 seconds using tools-login tunnels
T76699: Lost connection to MariaDB server during query - Mentioned Here
- T76699: Lost connection to MariaDB server during query
Event Timeline
Comment Actions
MariaDB [commonswiki_p]> select rc_user AS test from recentchanges limit 1; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1006686 Current database: commonswiki_p +---------+ | test | +---------+ | 3367542 | +---------+ 1 row in set (0.12 sec)
Comment Actions
(dupe comment from T76699)
No smoking gun yet.
As discussed a few times on labs-l in order fight abuse and replag we kill things explicitly when:
- A query runs for more that 28800 seconds
- One or more queries are about to collectively cause an OOM (rare)
- A client holds a connection open and idle for more than 60 seconds
- The CATSCAN stuff runs slow writes for more than 300 seconds
I wonder if #3 is the culprit here.
60s was chosen because some users leave transactions open for long periods and/or consume many concurrent connections for no good reason. If this is the problem, we can either bump up the time limit or encourage apps to auto-reconnect, use keep-alive, or just close them.
Comment Actions
The ~4sec estimate ... could it be when running a query on a connection that has just slept for ~60sec? Might be an overlap.