Context: We've developed a Wikipedia discussion corpus of the complete history of conversational actions (https://arxiv.org/abs/1810.13181)
We'd like to extend this to a 'live' corpus derived from wikipedia talk pages; we think this can provide valuable signal for abuse / harassment detection—something multiple teams at the Wikimedia Foundation and NDA'ed research collaborators have been working on.
We would like to explore the technical feasibility and the security implications of exposing deleted or suppressed revIDs through the Mediawiki API [[ https://wikitech.wikimedia.org/wiki/EventStreams | event stream ]]. We don't want any more information about suppressed and deleted revisions - just to know the ID when they are removed so that we can remove corresponding revisions in our corpora and any in any copies we might have.
This might look something like: A new kind of entry in the "recentchange" event stream entry with something like:
"deleted": <revision id>,
cc: @Jalexander @APalmer_WMF @leila @Iislucas @PEarleyWMF @jrbs @DarTar @JBennett