The ingestion job (cirrus-streaming-updater-consumer) should read messages from a kafka topic and write to an elasticsearch index.
Messages from the kafka topic should comply with the schema defined at https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/856507.
Writing to elasticsearch could be assisted with the elasticsearch connector.
The main function will be to create the bulk requests:
- create a scripted update request similar to what's done in CirrusSearch for revision based updates
- create delete request for page deletes.
- a new flink job can be scheduled consuming a topic of update document and writing to a elasticsearch cluster
- updates can filtered per-wiki based on a command line parameter (to ease testing)