Page MenuHomePhabricator

Avoid accepting Kafka messages with whacky timestamps
Open, MediumPublic


In we encountered an error where a bad kafka timestamp caused kafka log rolling to stop indefinitely, which filled up disks.

Having a bad Kafka timestamp (way out of range, e.g. years in the future or past) will also hurt stream processing and Hive partition ingestion.

We could configure Kafka to reject messages with timestamps that are too old or two far in the future with Setting this to the value of seems to make the most sense, but this caused issues with compacted topics as noted here. Kafka had as the default value for for a few versions but this was reverted to due complexities with compacted topics.

This really only matters when the data produced is untrusted. eventgate-analytics-external and eventgate-logging-external accept events from external producers. Our code does the right thing, but there is nothing stopping someone from manually POSTing an event with a whacky meta.dt, which will be used for the Kafka timestamp. After we do T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas, we should probably modify EventGate so that it always sets meta.dt itself, rather than accepting the producer's value if it is present.

This would help mitigate the potential problem, but it doesn't stop bugs in our code from emitting bad timestamps. Setting would, but I'm not sure what to do if we start using compacted topics.

Event Timeline

odimitrijevic moved this task from Incoming to Operational Excellence on the Analytics board.

I'd say this is medium to low priority and is something that needs to be worked on in collaboration with maintainers of other Kafka clusters.

Milimetric lowered the priority of this task from High to Medium.May 17 2021, 9:19 PM

This happened today, somehow there were recentchange events with timestamps from around 2007 in the kafka stream.