Page MenuHomePhabricator

The streaming updater should support page undeletes
Closed, ResolvedPublic5 Estimated Story Points

Description

When a deleted items is restored after a delete the streaming updater should produce an event with the data required to re-import this item into the graph.

Restoration of pages are propagated through the mediawiki.page-undelete stream, example:

{
  "$schema": "/mediawiki/page/undelete/1.0.0",
  "meta": {
    "uri": "https://test.wikidata.org/wiki/Q212433",
    "request_id": "11fcfff3-0fbf-4ed3-93df-3f85c63be2fc",
    "id": "7ad0b8c3-b54e-40d7-aa38-f2bf785f0ec4",
    "dt": "2020-07-01T14:26:46Z",
    "domain": "test.wikidata.org",
    "stream": "mediawiki.page-undelete"
  },
  "database": "testwikidatawiki",
  "performer": {
    "user_text": "DCausse (WMF)",
    "user_groups": [
      "bureaucrat",
      "sysop",
      "*",
      "user"
    ],
    "user_is_bot": false,
    "user_id": 2490,
    "user_registration_dt": "2017-09-28T06:49:13Z",
    "user_edit_count": 7
  },
  "page_id": 302928,
  "page_title": "Q212433",
  "page_namespace": 0,
  "page_is_redirect": false,
  "rev_id": 529859
}

The revision does not change, which means that when the full history of events is not known the only way to differentiate the order of a page deletion vs a restoration is the timestamp of the event.

On the shared model:

  • undeletes can re-use the "import" operation type

On the flink pipeline:

  • add a new case class PageUndelete in the IntputEvent ADT
  • add a new stream to consume from (kafka topic mediawiki.page-undelete) and produce PageUndelete to downstream operators
  • add a new case in DecideMutationOperation:
    • produce a FullImport operation if the map does not contain a revision of the item and add the revision to the map
    • produce a IgnoredMutation if the map contains the same or a future revision
    • produce a Diff operation if the map contains a previous revision

On the pipeline consumer:

  • Nothing as we re-use existing operations.

AC:
When restoring a deleted item from wikibase:

  • an event should be present in the streaming updater output with the data required to re-import the item
  • the data should re-appear in the query service graph when using the streaming updater

size: M

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as High priority.Sep 15 2020, 7:42 AM
CBogen set the point value for this task to 5.Sep 21 2020, 5:11 PM

Change 631265 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Add incoming undelete stream

https://gerrit.wikimedia.org/r/631265

Change 631449 had a related patch set uploaded (by Mstyles; owner: ZPapierski):
[wikidata/query/rdf@master] Handle undelete events

https://gerrit.wikimedia.org/r/631449

Change 631265 merged by jenkins-bot:
[wikidata/query/rdf@master] Add incoming undelete stream

https://gerrit.wikimedia.org/r/631265

Change 631449 merged by jenkins-bot:
[wikidata/query/rdf@master] Handle undelete events

https://gerrit.wikimedia.org/r/631449