Update: Debian package for bullseye (mydumper 0.10.0, built against MySQL 10.5.8) was tested to work until version 10.5. With 10.6, we believe we need a newer version for reloading of chunks due to a bug (currently validating 0.12.7-3)
Update 2: there seems to be a difference in formatting between 0.10 and 0.12, making impossible to load 0.10-generated dumps with 0.12.
Update 3: Thread-handling issues keep happening when loading 10.6 databases (it works with the higher version mydumper + 10.4) :-(
db2098 crashed recently (T318062), which led to temporarily substituting its backup generation service with another host. However, once the hardware memory issues were corrected, we would like to have db2098 as a passive redundant host for the database backup service.
In order to make sure data was safe, the host was re-imaged (with the same os release, bullseye) and tested with MariaDB 10.6, which is soon to be the preferred MariaDB version for MediaWiki, mostly to make sure the tooling was ready.
However, myloader fails (although it returns success) with several instances of:
** (myloader:2742004): CRITICAL **: 09:17:59.941: Error restoring fawiki.pagelinks from file fawiki.pagelinks.00001.sql.gz: Lo ck wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:18:50.058: Error restoring fawiki.pagelinks from file fawiki.pagelinks.00002.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:19:00.358: Error restoring fawiki.pagelinks from file fawiki.pagelinks.00003.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:19:33.501: Error restoring fawiki.pagelinks from file fawiki.pagelinks.00004.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:24:05.014: Error restoring fawiki.templatelinks from file fawiki.templatelinks.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:24:55.159: Error restoring fawiki.templatelinks from file fawiki.templatelinks.00002.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:25:45.311: Error restoring fawiki.templatelinks from file fawiki.templatelinks.00003.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:26:26.140: Error restoring fawiki.templatelinks from file fawiki.templatelinks.00004.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:42:45.075: Error restoring cawiki.pagelinks from file cawiki.pagelinks.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:45:00.843: Error restoring cawiki.templatelinks from file cawiki.templatelinks.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 09:59:48.276: Error restoring arwiki.categorylinks from file arwiki.categorylinks.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ... ** (myloader:2742004): CRITICAL **: 18:57:26.804: Error restoring viwiki.slots from file viwiki.slots.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 18:57:49.503: Error restoring viwiki.templatelinks from file viwiki.templatelinks.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 18:58:15.578: Error restoring viwiki.templatelinks from file viwiki.templatelinks.00002.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 18:58:16.927: Error restoring viwiki.templatelinks from file viwiki.templatelinks.00003.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 18:58:39.626: Error restoring viwiki.templatelinks from file viwiki.templatelinks.00004.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 18:59:06.995: Error restoring viwiki.text from file viwiki.text.00001.sql.gz: Lock wait timeout exceeded; try restarting transaction ** (myloader:2742004): CRITICAL **: 18:59:29.692: Error restoring viwiki.text from file viwiki.text.00002.sql.gz: Lock wait timeout exceeded; try restarting transaction
[The failures are real, load continues with many rows skipped/not reimported]
This makes no sense, as in theory mydumper loader makes sure to being able to restore table chunks in parallel (they are just independent inserts), and they should not block each other- in fact, we never had this issue until now. Possible reasons that could be affecting this issue:
Excessive locking due to configurationDiscarded, it failed too after using: SET GLOBAL binlog_format=ROW; FLUSH LOGS; SET GLOBAL tx_isolation = 'READ-COMMITTED'; Note these hosts never had binlog enabled.The particular dump is somehow corruptedDiscarded, it failed with 2 different dumps, the second had for sure no concurrent alter tables or some other maintenance- 10.6 specific behavior (this was never tested before with 10.6, only with 10.4, as this was part of a first validation of this new version)
- myloader bug - note that the behaviour of myloader is not ideal. https://bugs.launchpad.net/mydumper/+bug/806698 suggests to reduce --queries-per-transaction, but this wasn't necessary before (and it makes the loading slower). Better checking and monitoring may be needed anyway, because the behavior of the app is undesired
- Config or other kind of regression on the wmf side (e.g. our tooling)
- db2098 specific hardware or software configuration (eg. this being a multi-intance host that also serves s8 may be too small in io/memory resources to properly accommodate fast writes)