Page MenuHomePhabricator

SHA-256 digest for wiki dumps
Open, Needs TriagePublicFeature

Description

Feature summary:
I would like to request for the Wikimedia Dumps infrastructure to compute and publish a SHA-256 digest for each dumped data file.

Use case(s):
I'm working on a Toolforge service to perform automated, cryptographically-sound timestamping of Wikipedia snapshots. I'm using the opentimestamps.org service, which by default uses the SHA-256 hash. In order to get the SHA-256 digest of database files for the timestamp, my project currently needs to apply sha256sum to compute the digest of each file (example here). This computation is somewhat expensive for larger dump files, though not prohibitive.

Currently the xml data dumps provide only the MD5 and SHA-1 digests (here is an example). Both of these hash functions are obsolete because they are cryptographically broken.

Benefits:
It would be helpful if the Dumps service provided SHA-256 digests in addition to MD5 and SHA-1, because SHA-256 is cryptographically sound. Not only would it reduce the computing resources needed by my timestamping project, but it would also be useful to anyone else looking to verify the integrity of the dump files.