Page MenuHomePhabricator

Switch GenerateEntityDiffPatchOperation to a RichFunction
Closed, ResolvedPublic5 Estimated Story Points

Description

In order to prepare the switch to RichAsyncFunction it might be preferable to switch to RichFunction as a first step.
This change implies that instead of sending chunks (multiple EntityPatchOp holding a MutationEventData) the full data needs to be sent.
The transformation into a chunk of MutationEventData will happen as a very last step, this will help as well to make sure that the patch chunks are contiguous in the kafka sink (which is what the consumer assumes for now for simplicity).

  1. adapt the model
    • EntityPatchOp must now hold a RDFPatch (Statement are marked Serializable it might be good enough for now but we might think of a better serialization/more controlled/optimized serialization format of sesame objects).
  2. Switch GenerateEntityDiffPatchOperation to a RichFunction[MutationOperation, ResolvedOp]
    • calls to dataEventGenerator will then become unnecessary
  3. Add a new FlatMapFunction[EntityPatchOp, ChunckedEntityPatchOp] ChunkRDFDataOperation
    • Introduce a new case class ChunckedEntityPatchOp in the ADT ResolvedOp similar to the original EntityPatchOp (holding a MutationEventData)
    • Append the new operator to the end of the stream with a parallelism of 1 (right before MeasureEventProcessingLatencyOperation)

AC:

  • GenerateEntityDiffPatchOperation can be switched easily to a RichAsyncFunction
  • Chunks for the same patch are always contiguous in the output stream