Page MenuHomePhabricator

The streaming updater should support page undeletes
Open, HighPublic5 Estimated Story Points

Description

When a deleted items is restored after a delete the streaming updater should produce an event with the data required to re-import this item into the graph.

Restoration of pages are propagated through the mediawiki.page-undelete stream, example:

{
  "$schema": "/mediawiki/page/undelete/1.0.0",
  "meta": {
    "uri": "https://test.wikidata.org/wiki/Q212433",
    "request_id": "11fcfff3-0fbf-4ed3-93df-3f85c63be2fc",
    "id": "7ad0b8c3-b54e-40d7-aa38-f2bf785f0ec4",
    "dt": "2020-07-01T14:26:46Z",
    "domain": "test.wikidata.org",
    "stream": "mediawiki.page-undelete"
  },
  "database": "testwikidatawiki",
  "performer": {
    "user_text": "DCausse (WMF)",
    "user_groups": [
      "bureaucrat",
      "sysop",
      "*",
      "user"
    ],
    "user_is_bot": false,
    "user_id": 2490,
    "user_registration_dt": "2017-09-28T06:49:13Z",
    "user_edit_count": 7
  },
  "page_id": 302928,
  "page_title": "Q212433",
  "page_namespace": 0,
  "page_is_redirect": false,
  "rev_id": 529859
}

The revision does not change, which means that when the full history of events is not known the only way to differentiate the order of a page deletion vs a restoration is the timestamp of the event.

On the shared model:

  • undeletes can re-use the "import" operation type

On the flink pipeline:

  • add a new case class PageUndelete in the IntputEvent ADT
  • add a new stream to consume from (kafka topic mediawiki.page-undelete) and produce PageUndelete to downstream operators
  • add a new case in DecideMutationOperation:
    • produce a FullImport operation if the map does not contain a revision of the item and add the revision to the map
    • produce a IgnoredMutation if the map contains the same or a future revision
    • produce a Diff operation if the map contains a previous revision

On the pipeline consumer:

  • Nothing as we re-use existing operations.

AC:
When restoring a deleted item from wikibase:

  • an event should be present in the streaming updater output with the data required to re-import the item
  • the data should re-appear in the query service graph when using the streaming updater

size: M

Event Timeline

dcausse created this task.Jul 1 2020, 2:45 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptJul 1 2020, 2:45 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse updated the task description. (Show Details)Jul 1 2020, 2:51 PM
Gehel triaged this task as High priority.Tue, Sep 15, 7:42 AM
CBogen set the point value for this task to 5.Mon, Sep 21, 5:11 PM