Include checksums in
Please include text files with the hash values of the future entity dumps in in order to check data integrity. These files could be similar to the *sums.txt ones in

@hoo, can you fold this into the bash script without too much work?

hoo added a comment.Mar 27 2018, 2:47 AM

Piece of cake (I guess)… so yes, will schedule this for the week.

hoo added a comment.Mar 28 2018, 2:25 PM

Would we want one hash sum file per (dated) folder, or one for everything? Or both?

If one for everything, should it contain just the base file names (like wikidata-20180323-truthy-BETA.nt.bz2), or the relative path (like 20180323/wikidata-20180323-truthy-BETA.nt.bz2).

One per folder I suppose, so that as a particular run finishes up, the hash info is available.

abian added a comment.Mar 28 2018, 2:47 PM

That's also the easiest option for users, I think.

Change 423353 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/puppet@production] Add checksums for Wikidata entity dumps

Change 423353 merged by ArielGlenn:
[operations/puppet@production] Add checksums for Wikidata entity dumps

First checksums are available: and for

I manually added the JSON checksums, but the RDF ones were automatically added. I'll check next week to make sure this also correctly works for the JSON checksums, but I don't expect any surprises there.

Thank you!

JSON checksums look fine as well:

hoo@snapshot1007:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20180409$ md5sum -c wikidata-20180409-md5sums.txt 
wikidata-20180409-all.json.gz: OK
hoo@snapshot1007:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20180409$ sha1sum -c wikidata-20180409-sha1sums.txt
wikidata-20180409-all.json.gz: OK
