Page MenuHomePhabricator

checksum file incorrectly formated for incremental XML data dumps
Closed, ResolvedPublic


0) Problem

Checksum files for incremental XML data dumps are not formatted correctly. This causes `md5sum' to throw an error.

  1. Test case

(shell) wget -nH -np -nv -N -r -l 2

(shell) cd other/incr/simplewiki/

(shell) ls

(shell)$ cat simplewiki-20140703-md5sums.txt

(shell)$ md5sum --check simplewiki-20140703-md5sums.txt
md5sum: simplewiki-20140703-md5sums.txt: no properly formatted MD5 checksum
lines found

(shell)$ cat simplewiki-20140703-md5sums.txt

  1. Recommendation

The correct format is:

<checksum><two spaces><filename><newline>

Sincerely Yours,

Version: unspecified
Severity: normal
See also: T34130


Related Gerrit Patches:

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:40 AM
bzimport set Reference to bz67886.
Aklapper triaged this task as Low priority.Mar 23 2015, 5:42 PM
Aklapper added a subscriber: Aklapper.
Nemo_bis updated the task description. (Show Details)Apr 9 2015, 8:14 AM
Nemo_bis set Security to None.

Can someone point me to the code that generates the md5sums file for the incremental dumps.
This bug is too easy to leave unfixed for a year.

in ariel branch of operations-dumps: dumps/xmldumps-backup/incrementals/ function md5sums()

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptJul 14 2016, 7:18 PM
awight claimed this task.Dec 19 2016, 7:44 PM
awight moved this task from Backlog to Active on the Dumps-Generation board.
awight added a subscriber: ArielGlenn.

I found this interesting bit in the Wikipedia article on md5sum,

Note: There must be two spaces or a space and an asterisk between each md5sum value and filename to be compared (the second space indicates text mode, the asterisk binary mode). Otherwise, the following error will result: "no properly formatted MD5 checksum lines found". Many programs don't distinguish between the two modes, but some utils do.

We should check which mode we're using when calculating the digests.

I think the directory structure has changed since your comment above. I see the file on the ariel branch under xmldumps-backup/see_master_branch/, but when I look on the master branch I don't see that file, please help clarify why the directory has that name when you get the chance.

ah because that script is now called uh I think it is, and there's a little class for "incrementals" ie adds/changes, and one for html dumps. The idea is that if we want other similar dumps across all wikis of some new form we just use the same misc library and calling wrapper, which handles locks and dates and cleanup and etc. If you look at the recent git log you'll see it.

Change 328219 had a related patch set uploaded (by Awight):
Make md5sums.txt files compatible with md5sum --check

Change 328219 merged by ArielGlenn:
Make md5sums.txt files compatible with md5sum --check

ArielGlenn closed this task as Resolved.Jan 30 2017, 9:37 AM

Merged and deployed, thanks for the patch and your patience.

ArielGlenn moved this task from Active to Done on the Dumps-Generation board.Jan 30 2017, 9:37 AM