As discussed a bit in the Traffic Team, the current Benthos instance is configured to send extra data, compared to what currently VarnishKafka is sending to Kafka.
These are the fields that are sent from Benthos that aren't present in the current webrequest stream:
- HTTP version (eg. HTTP/1.1)
- $schema (required by DE), with "static" value set to "/webrequest/1.0.0"
- meta.id (uuidv4 generated by benthos)
- meta.request_id (different uuidv4 generated by haproxy)
About $schema I don't think there's much problem with it on the computational side on the cp hosts.
We don't need for sure two uuid (generate by different parts of the processing pipeline), that are expensive to generate under heavy load and could result in a potential waste of bandwidth/space on Kafka.
Maybe the sequence field could suffice for the same purpose? Or there's the ability to generate it later in the pipeline and not on the cp hosts directly?
For the HTTP version I suggest to discard it at the moment, as isn't currently used in webrequest and eventually add it later if needed.
We can use this ticket to discuss which fields should be kept and which can be safely discarded