Page MenuHomePhabricator

List of images without id causes SQL timeout
Open, Needs TriagePublic

Description

From the log file

2018-10-06_14:36:30 Make a list of images without id...
Traceback (most recent call last):
  File "/data/project/heritage/heritage/erfgoedbot/images_of_monuments_without_id.py", line 419, in <module>
    main()
  File "/data/project/heritage/heritage/erfgoedbot/images_of_monuments_without_id.py", line 411, in main
    countryconfig, add_template, conn, cursor, conn2, cursor2))
  File "/data/project/heritage/heritage/erfgoedbot/images_of_monuments_without_id.py", line 65, in processCountry
    withTemplate = getMonumentsWithTemplate(countryconfig, conn2, cursor2)
  File "/data/project/heritage/heritage/erfgoedbot/images_of_monuments_without_id.py", line 262, in getMonumentsWithTemplate
    cursor.execute(query, (commonsTrackerCategory,))
  File "/mnt/nfs/labstore-secondary-tools-project/heritage/.venv/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "/mnt/nfs/labstore-secondary-tools-project/heritage/.venv/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
CRITICAL: Closing network session.
<class '_mysql_exceptions.OperationalError'>
2018-10-06_14:58:55 Dump database...

Event Timeline

This is currently a query for all images in a category+subcategories with a sub-query of all images transcluding a certain template.

We could move the sub-query to a separate query and do the exclusion in python.
We could also drop the sub-categories bit if we consider the focus to rather be facilitating future categorisation.

It's unclear to me which would have the larger effect on the length of the SQL query.

@JeanFred any thoughts/suggestions? It seems like timing out SQL is constantly plaguing some part of this project.

I believe that the first country where the crash occurs is be-vlg. Looking at its base category it does seem to have a crazy amount of subcategories. Based on that unscientific analysis I suggest we only look at the base category itself.

Change 533105 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] [WIP]Make images_of_monuments_without_id ignore subcategories

https://gerrit.wikimedia.org/r/533105

Change 533105 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] [WIP]Make images_of_monuments_without_id ignore subcategories

https://gerrit.wikimedia.org/r/533105

Making a stab at this to see if I can unblock part of T231484: Write instructions email about the monuments database and wikidata for the wlm-announce