Page MenuHomePhabricator

Intermittent database errors
Open, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):
Sometimes pages or campaign actions (create and update) fail to load.
This appears random, but happens more frequently on the Testing instance (https://isa-dev.toolforge.org).

What happens?:
The server logs show errors like this:

sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')

Other info:
Errors started around August 17th 2022 and there were 186 of them (and 29 on dev) as of August 29th, so it seems that something has happened recently. Since it happens on the dev server too, which is quite small, I don't think it's the database getting out of hand. Could be something that's happened on Toolforge.
No new deployments happened close to the date the errors started so technically nothing was changed on our end.

Event Timeline

NavinoEvans triaged this task as Medium priority.

There were a couple of update errors that may have been caused by this.

12022-10-07 11:30:06,241 - ERROR - Failed to update images for campaign 5.
2Traceback (most recent call last):
3 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
4 self._validate_conn(conn)
5 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
6 conn.connect()
7 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connection.py", line 424, in connect
8 tls_in_tls=tls_in_tls,
9 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 450, in ssl_wrap_socket
10 sock, context, tls_in_tls, server_hostname=server_hostname
11 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
12 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
13 File "/usr/lib/python3.7/ssl.py", line 412, in wrap_socket
14 session=session
15 File "/usr/lib/python3.7/ssl.py", line 853, in _create
16 self.do_handshake()
17 File "/usr/lib/python3.7/ssl.py", line 1117, in do_handshake
18 self._sslobj.do_handshake()
19socket.timeout: _ssl.c:1039: The handshake operation timed out
20
21During handling of the above exception, another exception occurred:
22
23Traceback (most recent call last):
24 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/adapters.py", line 499, in send
25 timeout=timeout,
26 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 786, in urlopen
27 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
28 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 550, in increment
29 raise six.reraise(type(error), error, _stacktrace)
30 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/packages/six.py", line 770, in reraise
31 raise value
32 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
33 chunked=chunked,
34 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 389, in _make_request
35 self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
36 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 341, in _raise_timeout
37 self, url, "Read timed out. (read timeout=%s)" % timeout_value
38urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Read timed out. (read timeout=5)
39
40During handling of the above exception, another exception occurred:
41
42Traceback (most recent call last):
43 File "./isa/campaigns/image_updater.py", line 255, in _api_get
44 timeout=API_TIMEOUT
45 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/api.py", line 73, in get
46 return request("get", url, params=params, **kwargs)
47 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/api.py", line 59, in request
48 return session.request(method=method, url=url, **kwargs)
49 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/sessions.py", line 587, in request
50 resp = self.send(prep, **send_kwargs)
51 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/sessions.py", line 701, in send
52 r = adapter.send(request, **kwargs)
53 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/adapters.py", line 578, in send
54 raise ReadTimeout(e, request=request)
55requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Read timed out. (read timeout=5)
56
57During handling of the above exception, another exception occurred:
58
59Traceback (most recent call last):
60 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
61 chunked=chunked,
62 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
63 self._validate_conn(conn)
64 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
65 conn.connect()
66 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connection.py", line 424, in connect
67 tls_in_tls=tls_in_tls,
68 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 450, in ssl_wrap_socket
69 sock, context, tls_in_tls, server_hostname=server_hostname
70 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
71 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
72 File "/usr/lib/python3.7/ssl.py", line 412, in wrap_socket
73 session=session
74 File "/usr/lib/python3.7/ssl.py", line 853, in _create
75 self.do_handshake()
76 File "/usr/lib/python3.7/ssl.py", line 1117, in do_handshake
77 self._sslobj.do_handshake()
78OSError: [Errno 0] Error
79
80During handling of the above exception, another exception occurred:
81
82Traceback (most recent call last):
83 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/adapters.py", line 499, in send
84 timeout=timeout,
85 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 786, in urlopen
86 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
87 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 550, in increment
88 raise six.reraise(type(error), error, _stacktrace)
89 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/packages/six.py", line 769, in reraise
90 raise value.with_traceback(tb)
91 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
92 chunked=chunked,
93 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
94 self._validate_conn(conn)
95 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
96 conn.connect()
97 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/connection.py", line 424, in connect
98 tls_in_tls=tls_in_tls,
99 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 450, in ssl_wrap_socket
100 sock, context, tls_in_tls, server_hostname=server_hostname
101 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
102 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
103 File "/usr/lib/python3.7/ssl.py", line 412, in wrap_socket
104 session=session
105 File "/usr/lib/python3.7/ssl.py", line 853, in _create
106 self.do_handshake()
107 File "/usr/lib/python3.7/ssl.py", line 1117, in do_handshake
108 self._sslobj.do_handshake()
109urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError(0, 'Error'))
110
111During handling of the above exception, another exception occurred:
112
113Traceback (most recent call last):
114 File "./isa/campaigns/image_updater.py", line 58, in update
115 updater.update_images()
116 File "./isa/campaigns/image_updater.py", line 107, in update_images
117 self._fetch_images(category["name"], depth)
118 File "./isa/campaigns/image_updater.py", line 160, in _fetch_images
119 response = self._api_get(parameters)
120 File "./isa/campaigns/image_updater.py", line 264, in _api_get
121 return self._api_get(parameters, retries + 1)
122 File "./isa/campaigns/image_updater.py", line 255, in _api_get
123 timeout=API_TIMEOUT
124 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/api.py", line 73, in get
125 return request("get", url, params=params, **kwargs)
126 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/api.py", line 59, in request
127 return session.request(method=method, url=url, **kwargs)
128 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/sessions.py", line 587, in request
129 resp = self.send(prep, **send_kwargs)
130 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/sessions.py", line 701, in send
131 r = adapter.send(request, **kwargs)
132 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/requests/adapters.py", line 547, in send
133 raise ConnectionError(err, request=request)
134requests.exceptions.ConnectionError: ('Connection aborted.', OSError(0, 'Error'))
135Fri Oct 7 11:44:49 2022 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /campaigns/5 (ip 192.168.247.78) !!!
1362022-10-07 11:44:49,952 - ERROR - Exception on /campaigns/5 [GET]
137Traceback (most recent call last):
138 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
139 cursor, statement, parameters, context
140 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
141 cursor.execute(statement, parameters)
142 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 148, in execute
143 result = self._query(query)
144 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 310, in _query
145 conn.query(q)
146 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 548, in query
147 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
148 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 775, in _read_query_result
149 result.read()
150 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 1156, in read
151 first_packet = self.connection._read_packet()
152 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 692, in _read_packet
153 packet_header = self._read_bytes(4)
154 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 749, in _read_bytes
155 CR.CR_SERVER_LOST, "Lost connection to MySQL server during query"
156pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
157
158The above exception was the direct cause of the following exception:
159
160Traceback (most recent call last):
161 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/flask/app.py", line 2077, in wsgi_app
162 response = self.full_dispatch_request()
163 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/flask/app.py", line 1525, in full_dispatch_request
164 rv = self.handle_user_exception(e)
165 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/flask/app.py", line 1523, in full_dispatch_request
166 rv = self.dispatch_request()
167 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/flask/app.py", line 1509, in dispatch_request
168 return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
169 File "./isa/campaigns/routes.py", line 107, in getCampaignById
170 if campaign.campaign_image != ''
171 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/attributes.py", line 481, in __get__
172 return self.impl.get(state, dict_)
173 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/attributes.py", line 941, in get
174 value = self._fire_loader_callables(state, key, passive)
175 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/attributes.py", line 972, in _fire_loader_callables
176 return state._load_expired(state, passive)
177 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 712, in _load_expired
178 self.manager.expired_attribute_loader(self, toload, passive)
179 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 1459, in load_scalar_attributes
180 no_autoflush=no_autoflush,
181 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 418, in load_on_ident
182 execution_options=execution_options,
183 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 534, in load_on_pk_identity
184 bind_arguments=bind_arguments,
185 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1712, in execute
186 result = conn._execute_20(statement, params or {}, execution_options)
187 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1631, in _execute_20
188 return meth(self, args_10style, kwargs_10style, execution_options)
189 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
190 self, multiparams, params, execution_options
191 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1508, in _execute_clauseelement
192 cache_hit=cache_hit,
193 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1863, in _execute_context
194 e, statement, parameters, cursor, context
195 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2044, in _handle_dbapi_exception
196 sqlalchemy_exception, with_traceback=exc_info[2], from_=e
197 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
198 raise exception
199 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
200 cursor, statement, parameters, context
201 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
202 cursor.execute(statement, parameters)
203 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 148, in execute
204 result = self._query(query)
205 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 310, in _query
206 conn.query(q)
207 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 548, in query
208 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
209 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 775, in _read_query_result
210 result.read()
211 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 1156, in read
212 first_packet = self.connection._read_packet()
213 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 692, in _read_packet
214 packet_header = self._read_bytes(4)
215 File "/data/project/isa-dev/www/python/venv/lib/python3.7/site-packages/pymysql/connections.py", line 749, in _read_bytes
216 CR.CR_SERVER_LOST, "Lost connection to MySQL server during query"
217sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
218[SQL: SELECT campaign.id AS campaign_id, campaign.campaign_name AS campaign_campaign_name, campaign.campaign_images AS campaign_campaign_images, campaign.update_status AS campaign_update_status, campaign.campaign_contributions AS campaign_campaign_contributions, campaign.campaign_participants AS campaign_campaign_participants, campaign.campaign_image AS campaign_campaign_image, campaign.start_date AS campaign_start_date, campaign.campaign_manager AS campaign_campaign_manager, campaign.manager_id AS campaign_manager_id, campaign.end_date AS campaign_end_date, campaign.status AS campaign_status, campaign.short_description AS campaign_short_description, campaign.long_description AS campaign_long_description, campaign.categories AS campaign_categories, campaign.creation_date AS campaign_creation_date, campaign.campaign_type AS campaign_campaign_type, campaign.depicts_metadata AS campaign_depicts_metadata, campaign.captions_metadata AS campaign_captions_metadata
219FROM campaign
220WHERE campaign.id = %(pk_1)s]
221[parameters: {'pk_1': 5}]
222(Background on this error at: https://sqlalche.me/e/14/e3q8)

Lokal_Profil subscribed.

Amanda mentioned to Beat at Wikimania that the SDC team might be able to help out on a few of the ISA related issues where the current team needed support.

This aspect of the ISA tool (database connection issues) was identified as the area where the team has the least expertise and where the offered support from the WMF would be most valuable

There has been a lot of instability in the database connections/access to the database. This is particularly pronounced on the Dev instance <isa-dev.toolforge.org> but also occurs on the Live instance <isa.toolforge.org> . We know that insufficient error handling in ISA might account for some of the issues, but the same lack of error handling also makes it hard to trace. We can currently not say if it is due to our implementation, a library or Toolforge itself. There is a worry that migrating from Dev to Live will bring with it increased instability.

@NavinoEvans, @Eugene233 and @Sebastian_Berlin-WMSE can all support by providing log locations and further background as needed.

Heyo @Lokal_Profil ^_^ I talked with @Cparle about this. Cormac did you catch up with the ISA folks?

I noticed in P35377 there are also SSL errors connecting to commons.wikimedia.org:

urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Read timed out. (read timeout=5)

That points to a more generalized network connectivity issue than a database issue, but hard to say.

I don't see that specific error (2013, 'Lost connection to MySQL server during query') in the isa-dev logs (/data/project/isa-dev/uwsgi.log) but I do see a few occurrences in the isa logs (/data/proejct/isa/uwsgi.log).

The date/time of the errors in the logs don't seem to match any outage of ToolsDB, so it seems to confirm that the tool is sometimes seeing connectivity issues that are independent from the DB itself.

I would monitor if they continue at a similar rate or not, I only see a couple per day this week:

tools.isa@tools-sgebastion-10:~$ grep " - ERROR - " uwsgi.log
2023-10-29 03:19:40,585 - ERROR - Exception on /campaigns/140/stats [GET]
2023-10-29 15:17:34,322 - ERROR - Exception on /user-settings [GET]
2023-11-06 05:51:58,191 - ERROR - Exception on /campaigns/239 [GET]
  File "/data/project/isa/www/python/venv/lib/python3.11/site-packages/pymysql/connections.py", line 758, in _read_pack2023-11-06 05:51:57,977 - ERROR - Exception on /campaigns/239 [GET]
2023-11-06 05:56:21,771 - ERROR - Exception on /campaigns/239 [GET]
[SQL: SELECT contribution.id AS contribution_id, contribution.username AS contribution_username, contribution.user_id AS contribution_user_id, contribution.campaign_id AS contribution_campaign_id, contribution.file AS contribution_file, contribution.edit_type AS contribution_edit_type, contribution.edit_action AS contribution_edit_action, contribution.country AS contribution_country, contribution.depict2023-11-06 05:56:21,771 - ERROR - Exception on /campaigns/239 [GET]
  File "/data/project/isa/www/python/venv/lib/python3.11/site-packages/pymysql/connections.py", line 758, in _read_pack2023-11-06 05:56:21,778 - ERROR - Exception on /campaigns/239 [GET]
2023-11-07 22:27:50,708 - ERROR - Exception on /campaigns/239 [GET]
2023-11-07 22:31:43,667 - ERROR - Exception on /campaigns/94 [GET]
2023-11-08 10:11:14,192 - ERROR - Exception on /campaigns/228 [GET]
2023-11-08 22:36:14,274 - ERROR - Exception on /campaigns/228 [GET]
2023-11-09 10:47:28,172 - ERROR - Exception on /campaigns/228 [GET]

This actually seems like it might be something to do with SQLAlchemy and mysql recycling connections differently https://stackoverflow.com/questions/29755228/sqlalchemy-mysql-lost-connection-to-mysql-server-during-query

@Cparle oh yes, that's very possible. wait_timeout is currently set to 600 (10 minutes) on ToolsDB.

Heyo @Lokal_Profil ^_^ I talked with @Cparle about this. Cormac did you catch up with the ISA folks?

Many thanks! I've pinged the rest of the group to make sure they are aware of the developments here.