Page MenuHomePhabricator

sha1 should be visible even if the revision is hidden
Open, LowPublic


On w:pt:João Figueiredo there are revisions whose content was hidden by a sysop. The API doesn't show the sha1 for such revisions:|timestamp|user|size|sha1&rvlimit=20&pageids=40677
but it should, instead of showing that sha1hidden (what is the point in doing that?).
(I noticed this while investigating

Event Timeline

He7d3r raised the priority of this task from to Needs Triage.
He7d3r updated the task description. (Show Details)
He7d3r added a project: MediaWiki-API.
He7d3r added subscribers: He7d3r, Halfak.

If you get approval from WMF Legal (probably @LuisV_WMF) that revealing the SHA1 of a revision-deleted revision would be ok from a privacy perspective (i.e. that the risks raised in T45137 are not a real concern), I would not be opposed to this.

The point of "sha1hidden" is to note that the sha1 is hidden due to revision-deletion, which serves to indicate to normal users that the absence of sha1 isn't a bug and to privileged users that the sha1 they are seeing is revision-deleted.

The reasoning given was:

a revision might be hidden because of a very short string (first name of the contributor, phone number...). In this case it is possible to recover the hidden content from the SHA1 and the text of the next revision.

There's nothing in a sha1 that tells when it represents a short string or a long one. The only way to reverse the SHA1 is via a rainbow table[1]. The only practical way to generate a rainbow table to reverse revision content from a SHA1 would be to *know in advance* that the string contained a phone number or name and build the table for all potential first names and phone numbers (including spaces, symbols, etc.).

So, I don't think that hiding these sha1's provides any real measure of security/privacy, but I invite discussion.


There's nothing in a sha1 that tells when it represents a short string or a long one.

No, but the "size" field returned by prop=revisions does.

Oh! Good point. Maybe we can we keep the sha1 for all revision content > 10 bytes -- which is a little bit beyond the point where a rainbow table becomes intractable. Really, I think that is too complicated and I'll just work around the random missing sha1s.