When an item is deleted the streaming updater should produce a message instructing the consumer to delete the item from the graph.
Classic page deletions are made by admins and propagated through the mediawiki.page-delete stream.
Example message:
{ "$schema": "/mediawiki/page/delete/1.0.0", "meta": { "uri": "https://test.wikidata.org/wiki/Q212433", "request_id": "59f87c41-7680-4f8a-bf6e-7dac91530972", "id": "00fcac35-5357-4c99-ba9f-e720db9f0197", "dt": "2020-07-01T13:16:25Z", "domain": "test.wikidata.org", "stream": "mediawiki.page-delete" }, "database": "testwikidatawiki", "performer": { "user_text": "DCausse (WMF)", "user_groups": [ "bureaucrat", "sysop", "*", "user" ], "user_is_bot": false, "user_id": 2490, "user_registration_dt": "2017-09-28T06:49:13Z", "user_edit_count": 7 }, "page_id": 302928, "page_title": "Q212433", "page_namespace": 0, "page_is_redirect": false, "rev_id": 529859, "rev_count": 1, "comment": "content was: \"Test dcausse v2\", and the only contributor was \"[[Special:Contributions/DCausse (WMF)|DCausse (WMF)]]\" ([[User talk:DCausse (WMF)|talk]])", "parsedcomment": "content was: "Test dcausse v2", and the only contributor was "<a href=\"/wiki/Special:Contributions/DCausse_(WMF)\" title=\"Special:Contributions/DCausse (WMF)\">DCausse (WMF)</a>" (<a href=\"/w/index.php?title=User_talk:DCausse_(WMF)&action=edit&redlink=1\" class=\"new\" title=\"User talk:DCausse (WMF) (page does not exist)\">talk</a>)" }
This task involves:
On the shared model:
- add a new operation type "delete" to org.wikidata.query.rdf.tool.stream.MutationEventData
- add tests to org.wikidata.query.rdf.tool.stream.MutationEventDataJsonSerializationUnitTest to make sure that it's serialized properly
On the flink pipeline:
- add a new case class PageDelete in the IntputEvent ADT
- add a new case class DeleteItem in the MutationOperation ADT
- add a new stream to consume from (kafka topic mediawiki.page-delete) and produce PageDelete to downstream operators
- add a new case in DecideMutationOperation:
- produce a DeleteItem operation if the map contains a revision of the item and delete it from the map
- produce a IgnoredMutation otherwise
- add a new case in org.wikidata.query.rdf.updater.GenerateEntityDiffPatchOperation to support the DeleteItem operation and simply produce an EntityPathOp with a MutationEventData that has the type "delete".
On the pipeline consumer:
- Refactor RDFPatch so that it has two modes: (applying a diff, delete an item)
- Refactor org.wikidata.query.rdf.tool.stream.KafkaStreamConsumer so that it accumulates delete items
- Adapt org.wikidata.query.rdf.tool.rdf.RdfRepositoryUpdater#applyPatch to support item deletions
AC:
When deleting an item from wikibase:
- an event should be present in the streaming updater output indicating that this item needs to be deleted
- the data should disappear from the query service when using the streaming updater
size: XL