Page MenuHomePhabricator

Include checksums in
Closed, ResolvedPublic


Please include text files with the hash values of the future entity dumps in in order to check data integrity. These files could be similar to the *sums.txt ones in

Event Timeline

abian created this task.Mar 22 2018, 9:16 PM

@hoo, can you fold this into the bash script without too much work?

hoo added a comment.Mar 27 2018, 2:47 AM

@hoo, can you fold this into the bash script without too much work?

Piece of cake (I guess)… so yes, will schedule this for the week.

hoo added a comment.Mar 28 2018, 2:25 PM

Would we want one hash sum file per (dated) folder, or one for everything? Or both?

If one for everything, should it contain just the base file names (like wikidata-20180323-truthy-BETA.nt.bz2), or the relative path (like 20180323/wikidata-20180323-truthy-BETA.nt.bz2).

One per folder I suppose, so that as a particular run finishes up, the hash info is available.

abian added a comment.Mar 28 2018, 2:47 PM

That's also the easiest option for users, I think.

Change 423353 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/puppet@production] Add checksums for Wikidata entity dumps

hoo claimed this task.Apr 2 2018, 1:14 AM
hoo moved this task from Tasks to Needs Review on the Wikidata-Ministry-Of-Magic board.

Change 423353 merged by ArielGlenn:
[operations/puppet@production] Add checksums for Wikidata entity dumps

hoo closed this task as Resolved.Apr 5 2018, 12:07 PM

First checksums are available: and for

I manually added the JSON checksums, but the RDF ones were automatically added. I'll check next week to make sure this also correctly works for the JSON checksums, but I don't expect any surprises there.

abian added a comment.Apr 5 2018, 1:45 PM

Thank you!

hoo added a comment.Apr 11 2018, 9:48 AM

JSON checksums look fine as well:

hoo@snapshot1007:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20180409$ md5sum -c wikidata-20180409-md5sums.txt 
wikidata-20180409-all.json.gz: OK
hoo@snapshot1007:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20180409$ sha1sum -c wikidata-20180409-sha1sums.txt
wikidata-20180409-all.json.gz: OK
Envlh awarded a token.Apr 16 2018, 3:03 PM
ArielGlenn moved this task from Backlog to Done on the Dumps-Generation board.May 8 2018, 7:49 AM