Page MenuHomePhabricator

rev_len should be available also for deleted revisions in database replicas
Closed, ResolvedPublic

Description

Tool Labs host replicas of production databases that are very useful for research of article edit history. Revisions whose text was deleted have been marked with rev_deleted = 1 and rev_text_id = NULL in the revision table. However, rev_len has been set to NULL as well, although the size of such revision is still displayed publicly in the article's history in production (i.e. this information is not considered secret). It would greatly help the research of articles' growth over time if rev_len would be available also for deleted revisions in the database replicas.

Event Timeline

Blahma raised the priority of this task from to Needs Triage.
Blahma updated the task description. (Show Details)
Blahma added a project: Toolforge.
Blahma added a subscriber: Blahma.
Krenair set Security to None.
Krenair added a subscriber: Krenair.

I don't think this information is public: I cannot see the page size of a deleted revision as a regular user e.g. https://en.wikipedia.org/w/index.php?title=Special:Log&page=Talk%3AVimuttiguana

This is referring to deleted revisions, rather than archived revisions of deleted pages.

Sorry for the misunderstanding, I can confirm that those are not filtered on source:
https://git.wikimedia.org/blob/operations%2Fsoftware%2Fredactatron/e48c329ac35f0550d611d4039da2cedef6c269ce/scripts%2Fcols.txt#L600

But conditionally (unnecessarily?) nulled on view:

if((`enwiki`.`revision`.`rev_deleted` & 1),NULL,`enwiki`.`revision`.`rev_len`) AS `rev_len`
jcrespo moved this task from Triage to Backlog on the DBA board.

@Bawolff can you think of a reason that we should be hiding the length of rev deleted revisions in the replicas? They do seem to be visible in the page history:

Screen Shot 2017-10-26 at 10.11.23.png (77×821 px, 33 KB)

Marostegui added a subscriber: Marostegui.

As per T101631#1344185, we do not filter rev_len, so this is a matter of changing the view if finally rev_len is considered ok to be exposed.
Removing DBA as there is nothing for us to do here.

@Bstorm maybe I could bug you about this task too? :)

On the surface this seems very simple... rev_len is public on production but appears to be intentionally (wrongfully?) filtered out on the replicas. This throws off many tools. Hopefully an easy fix?

If I can get a thumbs up from @Bawolff, perhaps?

The current logic expressly filters rev_len on deleted revisions: if(rev_deleted&1,null,rev_len) as rev_len. I don't know if that's just for consistency or if someone thinks that really should be kept out of the replicas. As stated above, it does seem to be available online, though I'm not sure if that's all versions of the deleted field, since that's an integer, I think.

If I can get a thumbs up from @Bawolff, perhaps?

The current logic expressly filters rev_len on deleted revisions: if(rev_deleted&1,null,rev_len) as rev_len. I don't know if that's just for consistency or if someone thinks that really should be kept out of the replicas. As stated above, it does seem to be available online, though I'm not sure if that's all versions of the deleted field, since that's an integer, I think.

+1

Change 515062 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] wiki replicas: unfilter deleted rev_len versions

https://gerrit.wikimedia.org/r/515062

@Anomie asked a good question on the patch review. "Should you do the same for ar_len, here and in the other archive views?" referring to: if(ar_deleted&1,null,ar_len) as ar_len

This is referring to deleted revisions, rather than archived revisions of deleted pages.

Based on @Krenair's comment, I think the answer is no but I'm not 100% sure.

Change 515062 merged by Jhedden:
[operations/puppet@production] wiki replicas: unfilter deleted rev_len versions

https://gerrit.wikimedia.org/r/515062

Mentioned in SAL (#wikimedia-operations) [2019-06-20T14:36:55Z] <jeh> T101631 updating replica views on labsdb1012

Mentioned in SAL (#wikimedia-operations) [2019-06-20T14:47:37Z] <jeh> T101631 updating replica views on labsdb1011

Mentioned in SAL (#wikimedia-operations) [2019-06-20T14:54:48Z] <jeh> T101631 updating replica views on labsdb1010

Mentioned in SAL (#wikimedia-operations) [2019-06-20T15:01:47Z] <jeh> T101631 updating replica views on labsdb1009

@Anomie asked a good question on the patch review. "Should you do the same for ar_len, here and in the other archive views?" referring to: if(ar_deleted&1,null,ar_len) as ar_len

This is referring to deleted revisions, rather than archived revisions of deleted pages.

Based on @Krenair's comment, I think the answer is no but I'm not 100% sure.

My question isn't really about archived revisions of deleted pages in that sense either. If we're exposing the length for revdeled non-archived revisions, why should the situation be any different for the length of revdeled archived revisions?