Page MenuHomePhabricator

Expose mediawiki/revision/tags-change in stream.wikimedia.org
Closed, ResolvedPublic

Description

I think mediawiki/revision/tags-change could be useful to users of stream.wikimedia.org. Use cases I have on my mind:

  • I'm a bot author, and I wish to mark edits made by a certain OAuth tool as patrolled. The bot connects to stream.wm.o, watches for edits tagged with the tool's tag, and patrols them.
  • I'm a bot author, and I wish to automatically revert edits flagged with a given abuse filter. I set the AF to tag the edits in question, and I set up a bot to look for the edits and revert (can be useful to give the vandal a feeling "it was saved", hoping they won't check few minutes later, creating a shadow-revert effect).

I can imagine also other use-cases, such as "i save metadata all reverted edits from stream.wm.o into a file and then analyze that JSON", instead of "query all public wiki DB replicas, get all reverted edits [bearing in mind they can be deleted by now] and then go through them".

Looking through the schema definition in https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/primary/+/d725698c3d5a117b4cb52d9cfcb7e94bb9c3a306/jsonschema/mediawiki/revision/tags-change/1.0.0.yaml, I don't see anything that would need to be private. The revision create event is already there, and it gives similar amount of metadata to users (except tags, of course).

Thanks for considering this idea.

Event Timeline

yeah lets do it!

@sguebo_WMF @Htriedman before we just do this, wanted to check with your team to see if there is a process we need to go through. I believe this data is already public in the wikis, but we would be exposing a stream of this data. E.g. if there were something like 'suppressed revisions tags', those tags would be consumable publicly for 7 days even if they are removed from the wikis (revision create has this problem too).

Revision tags change event schema is here: https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/revision/tags-change/1.0.0.yaml

Ok with you all to proceed?

Ottomata triaged this task as Medium priority.Oct 28 2021, 5:07 PM
Ottomata moved this task from Serve to Datasets on the Data-Engineering board.

Hi @Ottomata!

I know that this is the same theoretical attack vector as revision create, e.g. someone creates a page with a title like "Hal Triedman's SSN is XX-XXX-XXXX" that is quickly removed and suppressed, but the revision create event publicly consumable in the event stream for 7 days.

The extent of the problem (frequency and severity) is still an unknown unknown that I've been meaning to work on for a couple of months but haven't been able to because of permissions issues.

Regardless, I don't think that this new event stream poses a major new risk beyond what we already know of. Potential mitigations (shortening the cache of consumable events, creating some register of suppressions post-facto and nullifying cache events that were suppress) for both event streams would likely be the same, correct?

I would also like to hear about what potential risks a "suppressed revision tag" could pose.

Potential mitigations [...] for both event streams would likely be the same, correct?

Yup!

I would also like to hear about what potential risks a "suppressed revision tag" could pose.

I think the risk is the same as revision create, except smaller only because there are fewer revision tags than revisions. A tag added to a revision with a page_title like your example would result in a revision.tags-change event with the page_title in it too.

Got it. In that case I don't see it adding any new privacy risk — I'll just make sure to bump my investigation of the frequency and severity of these incidents up on my todo list.

Change 805814 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] eventstreams - expose mediawiki.revision-tags-change

https://gerrit.wikimedia.org/r/805814

Change 805814 merged by Ottomata:

[operations/deployment-charts@master] eventstreams - expose mediawiki.revision-tags-change

https://gerrit.wikimedia.org/r/805814

Getting a strange error when trying to deploy:

command "/usr/bin/helm3" exited with non-zero status
09:46:12 
STDERR:
09:46:12 
  WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/kubernetes/eventstreams-deploy-staging.config
09:46:12 
  Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Asking service-ops now.