Tool Labs host replicas of production databases that are very useful for research of article edit history. Revisions whose text was deleted have been marked with rev_deleted = 1 and rev_text_id = NULL in the revision table. However, rev_len has been set to NULL as well, although the size of such revision is still displayed publicly in the article's history in production (i.e. this information is not considered secret). It would greatly help the research of articles' growth over time if rev_len would be available also for deleted revisions in the database replicas.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/puppet | production | +3 -3 | wiki replicas: unfilter deleted rev_len versions |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | jcrespo | T150767 Wikireplica service for tools and labs - issues and missing available views (tracking) | |||
Resolved | None | T101631 rev_len should be available also for deleted revisions in database replicas |
Event Timeline
I don't think this information is public: I cannot see the page size of a deleted revision as a regular user e.g. https://en.wikipedia.org/w/index.php?title=Special:Log&page=Talk%3AVimuttiguana
This is referring to deleted revisions, rather than archived revisions of deleted pages.
For example https://en.wikipedia.org/w/index.php?title=Warlingham_School&offset=20100527182211&action=history - those entries from 2010-01-14.
Sorry for the misunderstanding, I can confirm that those are not filtered on source:
https://git.wikimedia.org/blob/operations%2Fsoftware%2Fredactatron/e48c329ac35f0550d611d4039da2cedef6c269ce/scripts%2Fcols.txt#L600
But conditionally (unnecessarily?) nulled on view:
if((`enwiki`.`revision`.`rev_deleted` & 1),NULL,`enwiki`.`revision`.`rev_len`) AS `rev_len`
@Bawolff can you think of a reason that we should be hiding the length of rev deleted revisions in the replicas? They do seem to be visible in the page history:
As per T101631#1344185, we do not filter rev_len, so this is a matter of changing the view if finally rev_len is considered ok to be exposed.
Removing DBA as there is nothing for us to do here.
@Bstorm maybe I could bug you about this task too? :)
On the surface this seems very simple... rev_len is public on production but appears to be intentionally (wrongfully?) filtered out on the replicas. This throws off many tools. Hopefully an easy fix?
If I can get a thumbs up from @Bawolff, perhaps?
The current logic expressly filters rev_len on deleted revisions: if(rev_deleted&1,null,rev_len) as rev_len. I don't know if that's just for consistency or if someone thinks that really should be kept out of the replicas. As stated above, it does seem to be available online, though I'm not sure if that's all versions of the deleted field, since that's an integer, I think.
Change 515062 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] wiki replicas: unfilter deleted rev_len versions
Change 515062 merged by Jhedden:
[operations/puppet@production] wiki replicas: unfilter deleted rev_len versions
Mentioned in SAL (#wikimedia-operations) [2019-06-20T14:36:55Z] <jeh> T101631 updating replica views on labsdb1012
Mentioned in SAL (#wikimedia-operations) [2019-06-20T14:47:37Z] <jeh> T101631 updating replica views on labsdb1011
Mentioned in SAL (#wikimedia-operations) [2019-06-20T14:54:48Z] <jeh> T101631 updating replica views on labsdb1010
Mentioned in SAL (#wikimedia-operations) [2019-06-20T15:01:47Z] <jeh> T101631 updating replica views on labsdb1009
My question isn't really about archived revisions of deleted pages in that sense either. If we're exposing the length for revdeled non-archived revisions, why should the situation be any different for the length of revdeled archived revisions?