Page MenuHomePhabricator

transferpy --checksum wrongly output `checksums do not match` message
Open, LowPublic

Description

The transferpy is a package used for database backup and recovery.
transferpy has showed one problem with --checksum, The problem is, the checksum is out of order and transferpy says: checksum does not match. An instance of that is given below:

About to transfer /srv/bigfile from transferpy-test-1.transferpy-test.eqiad1.wikimedia.cloud to ['transferpy-test-2.transferpy-test.eqiad1.wikimedia.cloud']:['/srv'] (1536000004233 bytes)
ERROR: Original checksum 90348fa5aec46249ec3ce3cb4ddd5bbb  bigfile/creatorOneFile
ed0d61cd59f697533984adcfadd3bf3a  bigfile/file500gb1
af9dc0785976c1adae326e07fc43bb4b  bigfile/creator
94ee6bbb203593a7412e3eb01fb55eb7  bigfile/file500gb2
16e01fe9192dfd7a591524b7c6cca329  bigfile/file500gb3 on transferpy-test-1.transferpy-test.eqiad1.wikimedia.cloud is different than checksum af9dc0785976c1adae326e07fc43bb4b  bigfile/creator
ed0d61cd59f697533984adcfadd3bf3a  bigfile/file500gb1
90348fa5aec46249ec3ce3cb4ddd5bbb  bigfile/creatorOneFile
16e01fe9192dfd7a591524b7c6cca329  bigfile/file500gb3
94ee6bbb203593a7412e3eb01fb55eb7  bigfile/file500gb2 on transferpy-test-2.transferpy-test.eqiad1.wikimedia.cloud

Event Timeline

Privacybatm updated the task description. (Show Details)
Privacybatm moved this task from Triage to GSOC2020 on the DBA board.

Indeed an issue with a clear reason why it happens (directory traversal order in the method used is not deterministic).

There are several options on how to go about this. I think in the past I skipped using just md5sum and used find -exec mad5sum to get a deterministic order. Otherwise we could do md5sum -check. I am open for other options, but we may want to wait until we decide what is the best way to checksum asynchronously. It should also scale for hundreds of thousands of files.

Yeah, I will have a search on this. Let's this ticket be here so that we can keep an eye on it!

jcrespo moved this task from GSOC2020 to Backlog on the DBA board.

This is low priority, but a valid bug- since parallel_checksum will most mostly used from now on.

Reseting the assignee as I believe @Privacybatm isn't currently working on this.

@Marostegui Yes, I am not working on this now. Thank you for resetting.