Page MenuHomePhabricator

EditGroups tool: Unstable connection to SQL database on Toolforge
Open, Needs TriagePublic

Description

The EditGroups tool listens to the EventStream for Wikidata edits and stores their metadata in its own SQL database.

The listener process is a Python program which runs in Kubernetes. It connects to the SQL database (s53685__editgroups) via Django's ORM and performs the ingestion by batches of edits.

Over the past few weeks, the listener has started to die a lot more frequently, generally because its SQL connection vanishes:

django.db.utils.OperationalError: (2006, 'MySQL server has gone away')

(full stack trace available in the corresponding GitHub issue).

These repeated failures (about one every hour currently) mean that the listener is accumulating lag (currently 7 hours behind).

Has labsdb become more unstable recently? Is there any way to prevent these disconnections? Could perhaps the database be moved to a more reliable host?

Event Timeline

Pintoch renamed this task from EditGroups tool: Unstable connection to SQL database on Toollabs to EditGroups tool: Unstable connection to SQL database on Toolforge.Dec 18 2019, 11:24 AM

'MySQL server has gone away' means that the connection has been closed from the server side. This is not uncommon and something that tools should be prepared to deal with. Per the connection handling policy, tools should minimize the amount of time they hold a database connection open, and the server may choose to close idle connections at any time.

OK! This connection should never be idle given that bot edits on Wikidata never stop, so I am still not sure why this happens. It might be due to the specifics of how Django handles these long-running SQL connections.