Page MenuHomePhabricator

SHA1 checksum in XML dump is wrong for revisions that had an export transformation applied.
Open, HighPublic

Description

Currently, ContentHandler::exportTransform may modified page content on the fly when generating XML dumps. However, we are currently now re-calculating the SHA1 hash for the revision, causing it to be inconsistent with the transformed content.

The XML dump should contain sha1 checksums that are correct for the text in the dump, even if that differs from the raw contents of the database.


Version: unspecified
Severity: normal

Details

Reference
bz72478

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:47 AM
bzimport set Reference to bz72478.
bzimport added a subscriber: Unknown Object (MLST).
daniel created this task.Oct 24 2014, 2:05 PM

gerritadmin wrote:

Change 168587 had a related patch set uploaded by Daniel Kinzler:
Re-caclulate SHA1 after applying exportTransform

https://gerrit.wikimedia.org/r/168587

In my opinion the sha1 should be the same as in rev_sha1 and which is also outputted by the api. Maybe add a new hash as explicit checksum.
But sha in output should be the same as in the api.

I think it would probably be best to introduce an additional checksum attribute in the <text> tag. May be a bit expensive if we compute it on-the-fly every time.

Change 168587 abandoned by Daniel Kinzler:
Re-caclulate SHA1 after applying exportTransform

https://gerrit.wikimedia.org/r/168587

Hydriz set Security to None.