Page MenuHomePhabricator

ErfgoedBot categorize_images locks `monuments_all` table, preventing the atomic replace
Open, Needs TriagePublic

Description

In d73eb9ee3cd4, we splitted out the categorization task to its own job, that runs in parallel to the main job update_monuments.

However, as long as categorize_images is running, update_monuments hangs on heritage/erfgoedbot/sql/fill_table_monuments_all.sql, (presumably when doing

ALTER TABLE `monuments_all_tmp` RENAME TO `monuments_all`;

Event Timeline

Been meaning to file that for a while now. Basically, for the past two weeks I have been killing the categorize_images task every day around 2-3PM, to allow update_monuments to complete.

Thanks for investigating. Is there any solution other than putting the two back together tho execute sequentially?

Looks like it's sad again today. killed the categorization job.

For archive happiness. This was "solved" by

  • using jstop to stop the categorisation prior to update_monument then jstart to start it again afterwards
  • blacklisting the countries most responsible for slowing down the categoirsation,

@JeanFred Unsure if we should consider this one resolved and instead move the focus to figuring out a way of not restarting from scratch at jstart?

Been thinking about this. If we have diagnosed this properly, the issue is that categorize_images locks the table for its entire run. Indeed the code is:

(conn, cursor) = connect_to_monuments_database()
if countrycode and lang:
    processCountry()
else:
    statistics = []
    for (countrycode, lang), countryconfig in mconfig.countries.iteritems():
        processCountry()
    outputStatistics(statistics)
close_database_connection(conn, cursor)

How about we close and re-establish the database connection between each country?

if countrycode and lang:
    (conn, cursor) = connect_to_monuments_database()
    processCountry()
    close_database_connection(conn, cursor)
else:
    statistics = []
    for (countrycode, lang), countryconfig in mconfig.countries.iteritems():
        (conn, cursor) = connect_to_monuments_database()
        processCountry()
        close_database_connection(conn, cursor)
    outputStatistics(statistics)

Presumably, the database replace would then kick-in, and categorize_images would then be the one to wait?

Presumably, the database replace would then kick-in, and categorize_images would then be the one to wait?

Would the databas replace nicely sit and wait for a pause in categorization? (And the same in the opposite direction afterwards?) Or will it just complain about a lock then crash?

If it does survive then this is a nice so l nation which additionally means categorisation doesn't get restarted from the first country.