Page MenuHomePhabricator

pymysql.err.OperationalError: (1412, 'Table definition has changed, please retry transaction')
Closed, ResolvedPublic

Description

The first time data is loaded for the addlink application, you'll get this error:

pymysql.err.OperationalError: (1412, 'Table definition has changed, please retry transaction')

Which occurs at the end of this output:

== Initializing ==
   Ensuring checksum table exists...[OK]
   Ensuring model table exists...[OK]
   Beginning process to load datasets for cswiki
== Attempting to download datasets (anchors, redirects, pageids, w2vfiltered, model) for cswiki ==
   Ensuring anchors table exists...[OK]
   No checksum found for anchors in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/cswiki/lr_cswiki_anchors.sql.gz...[OK]
   Ensuring redirects table exists...[OK]
   No checksum found for redirects in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/cswiki/lr_cswiki_redirects.sql.gz...[OK]
   Ensuring pageids table exists...[OK]
   No checksum found for pageids in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/cswiki/lr_cswiki_pageids.sql.gz...[OK]
   Ensuring w2vfiltered table exists...[OK]
   No checksum found for w2vfiltered in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/cswiki/lr_cswiki_w2vfiltered.sql.gz...[OK]
   Ensuring model table exists...[OK]
   No checksum found for model in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/cswiki/cswiki.linkmodel.json...[OK]
== Importing datasets (anchors, redirects, pageids, w2vfiltered, model) for cswiki ==
   Verifying file and checksum exists for anchors...[OK]
   Verifying checksum for anchors...[OK]
   Ensuring anchors table exists...[OK]
   Verifying file and checksum exists for redirects...[OK]
   Verifying checksum for redirects...[OK]
   Ensuring redirects table exists...[OK]
   Verifying file and checksum exists for pageids...[OK]
   Verifying checksum for pageids...[OK]
   Ensuring pageids table exists...[OK]
   Verifying file and checksum exists for w2vfiltered...[OK]
   Verifying checksum for w2vfiltered...[OK]
   Ensuring w2vfiltered table exists...[OK]
   Verifying file and checksum exists for model...[OK]
   Verifying checksum for model...[OK]
   Ensuring model table exists...[OK]
   Processing dataset: anchors
     Deleting all values from lr_cswiki_anchors...[OK]
     Inserting content into lr_cswiki_anchors...[OK]
       672122 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: redirects
     Deleting all values from lr_cswiki_redirects...Traceback (most recent call last):

We should either ensure the tables exist in a separate series of transactions, or modify create_tables to not commit when called via load-datasets.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Nevermind, CREATE TABLE is processed as a single transaction so I think let's just do the ensure_tables() bit in a separate loop from the data import process.

Change 665999 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[research/mwaddlink@main] load-datasets: Open new connection for data import

https://gerrit.wikimedia.org/r/665999

Nevermind, CREATE TABLE is processed as a single transaction so I think let's just do the ensure_tables() bit in a separate loop from the data import process.

That doesn't work either, unfortunately https://gerrit.wikimedia.org/r/c/research/mwaddlink/+/665999/1#message-a8120a3f2891be351d4ff4c9e0ddb41602627d94

Moving out of the current sprint; this would be good to fix, but we don't need to do it right now.

Change 665999 abandoned by Kosta Harlan:
[research/mwaddlink@main] load-datasets: Open new connection for data import

Reason:

https://gerrit.wikimedia.org/r/665999

Change 673347 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[research/mwaddlink@main] Create tables before opening transaction

https://gerrit.wikimedia.org/r/673347

kostajh added a subscriber: Tgr.

thank you @Tgr!

Change 673347 merged by jenkins-bot:
[research/mwaddlink@main] Create tables before opening transaction

https://gerrit.wikimedia.org/r/673347