Switch GenerateEntityDiffPatchOperation to a RichFunction
Closed, ResolvedPublic5 Estimated Story Points
Actions

Assigned To

Authored By

	dcausse
	Jul 23 2020, 9:18 AM

Description

In order to prepare the switch to RichAsyncFunction it might be preferable to switch to RichFunction as a first step.
This change implies that instead of sending chunks (multiple EntityPatchOp holding a MutationEventData) the full data needs to be sent.
The transformation into a chunk of MutationEventData will happen as a very last step, this will help as well to make sure that the patch chunks are contiguous in the kafka sink (which is what the consumer assumes for now for simplicity).

adapt the model
- EntityPatchOp must now hold a RDFPatch (Statement are marked Serializable it might be good enough for now but we might think of a better serialization/more controlled/optimized serialization format of sesame objects).
Switch GenerateEntityDiffPatchOperation to a RichFunction[MutationOperation, ResolvedOp]
- calls to dataEventGenerator will then become unnecessary
Add a new FlatMapFunction[EntityPatchOp, ChunckedEntityPatchOp] ChunkRDFDataOperation
- Introduce a new case class ChunckedEntityPatchOp in the ADT ResolvedOp similar to the original EntityPatchOp (holding a MutationEventData)
- Append the new operator to the end of the stream with a parallelism of 1 (right before MeasureEventProcessingLatencyOperation)

AC:

GenerateEntityDiffPatchOperation can be switched easily to a RichAsyncFunction
Chunks for the same patch are always contiguous in the output stream

Related Objects
Search...

Status	Assigned	Task
Resolved	Gehel	T244590 [Epic] Rework the WDQS updater as an event driven application
Resolved	dcausse	T258684 Use RichAsyncFunction for GenerateEntityDiffPatchOperation
Resolved	dcausse	T258683 Switch GenerateEntityDiffPatchOperation to a RichFunction

Event Timeline

dcausse created this task.Jul 23 2020, 9:18 AM

Restricted Application added a project: Wikidata. · View Herald TranscriptJul 23 2020, 9:18 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

dcausse added a parent task: T244590: [Epic] Rework the WDQS updater as an event driven application.Jul 23 2020, 9:19 AM

dcausse edited parent tasks, added: T258684: Use RichAsyncFunction for GenerateEntityDiffPatchOperation; removed: T244590: [Epic] Rework the WDQS updater as an event driven application.Jul 23 2020, 9:26 AM

dcausse claimed this task.Jul 31 2020, 10:17 AM

dcausse triaged this task as Medium priority.

dcausse moved this task from Incoming to Current work on the Wikidata-Query-Service board.

dcausse added a project: Discovery-Search (Current work).

dcausse set the point value for this task to 5.

dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board.Aug 3 2020, 8:04 AM

dcausse moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Aug 4 2020, 2:26 PM

Gehel closed this task as Resolved.Aug 17 2020, 12:46 PM

Switch GenerateEntityDiffPatchOperation to a RichFunctionClosed, ResolvedPublic5 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Switch GenerateEntityDiffPatchOperation to a RichFunction
Closed, ResolvedPublic5 Estimated Story Points
Actions

Related Objects
Search...