Page MenuHomePhabricator

The streaming updater should support suppressed deletes
Closed, ResolvedPublic5 Estimated Story Points

Description

When an item is deleted by an oversighter (checking the suppress checkbox) the streaming updater should produce a message instructing the consumer to delete the item from the graph.

The kafka topic populated by change prop is mediawiki.page-suppress (this topic is considered private and highly sensitive).

The event itself is similar to what is read from mediawiki.page-delete.

{
  "$schema": "/mediawiki/page/delete/1.0.0",
  "meta": {
    "uri": "https://test.wikidata.org/wiki/Q212437",
    "request_id": "d37828b9-e3ce-4eea-a91d-921d0c5ad9c9",
    "id": "7e8abcee-2a84-476e-afee-07eb404e1085",
    "dt": "2020-07-02T09:36:49Z",
    "domain": "test.wikidata.org",
    "stream": "mediawiki.page-suppress"
  },
  "database": "testwikidatawiki",
  "performer": {
    "user_text": "DCausse (WMF)",
    "user_groups": [
      "bureaucrat",
      "oversight",
      "sysop",
      "*",
      "user"
    ],
    "user_is_bot": false,
    "user_id": 2490,
    "user_registration_dt": "2017-09-28T06:49:13Z",
    "user_edit_count": 11
  },
  "page_id": 302932,
  "page_title": "Q212437",
  "page_namespace": 0,
  "page_is_redirect": false,
  "rev_id": 529864,
  "rev_count": 1,
  "comment": "content was: \"Test dcausse v6\", and the only contributor was \"[[Special:Contributions/DCausse (WMF)|DCausse (WMF)]]\" ([[User talk:DCausse (WMF)|talk]])",
  "parsedcomment": "content was: &quot;Test dcausse v6&quot;, and the only contributor was &quot;<a href=\"/wiki/Special:Contributions/DCausse_(WMF)\" title=\"Special:Contributions/DCausse (WMF)\">DCausse (WMF)</a>&quot; (<a href=\"/w/index.php?title=User_talk:DCausse_(WMF)&amp;action=edit&amp;redlink=1\" class=\"new\" title=\"User talk:DCausse (WMF) (page does not exist)\">talk</a>)"
}

On the shared model:

On the flink pipeline:

  • add a new stream to consume from (kafka topic mediawiki.page-suppress) and produce PageDelete to downstream operators
    • the sole information populated from the original event is:
      • the page_title: must conform to wikidata item pattern ([QPL]number) that is generated by wikibase and thus cannot contain sensitive info that triggered the suppression
      • revision: a number
      • the request_id

On the pipeline consumer:

  • it will be seen as a classic delete

AC:

  • fix T105427
  • the streaming updater output must not leak any information of the suppression and should appear like a "normal" delete

size: M

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2020-07-02T09:07:25Z] <addshore> addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "Addshore" # T256949

Mentioned in SAL (#wikimedia-operations) [2020-07-02T09:07:36Z] <addshore> addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "DCausse_(WMF)" # T256949

dcausse updated the task description. (Show Details)
dcausse updated the task description. (Show Details)

Please make sure data does not got deleted when only a history revision is suppressed.

@Bugreporter this task is about propagating the delete when an oversighter deletes a page with the Suppress data from administrators as well as others checkbox activated:

suppressed_delete.png (464×1 px, 43 KB)

Deleting (hiding) previous revisions won't be tracked by the updater since it does not affect what is visible in the wdqs.
If you think there are other means to delete an active revisions please let me know and I'll adapt this task or create a new one.

CBogen set the point value for this task to 5.Aug 31 2020, 5:20 PM

Change 631449 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Handle undelete events

https://gerrit.wikimedia.org/r/631449

I don't see a schema for suppressed deletes in the list of page schemas (https://schema.wikimedia.org/repositories/primary/jsonschema/mediawiki/page/). Is that information intentionally private?
Also, I can't see how having the page namespace and other information in the event meta compromises the event.
Do we want to default the page namespace to 0?
I think we could just listen to the suppressed deletes topic and pass them as regular deletes. Please let me know why we cannot do that

Change 633019 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/rdf@master] Handle suppressed deletes

https://gerrit.wikimedia.org/r/633019

Change 633019 merged by jenkins-bot:
[wikidata/query/rdf@master] Handle suppressed deletes

https://gerrit.wikimedia.org/r/633019