Change Details

Kafka client fetches the messages as fast as possible, so the only limit of the execution concurrency is CPU. The problem is that most of the messages in the topic might not match the rule, so processing them is incredibly fast. The commit function in the kafka client implicitly commits the offset of the latest consumed message, thus making the whole commit feature pointless, as by the time an actual execution of some message will finish (or crash), it's almost guaranteed that some newer non-matching messages were already consumed and newer offset was committed. Instead, we propose to implement the task queue. The task queue internally would be an object, mapping from a message offset to a promise of a task execution. 1. When the message is received, an executor task is pushed to the queue using a message offset as a key. If the queue size is over the limit, the consumer is paused. If the message doesn't match, we do not push it to the queue, but also do not invoke a commit afterwards. 2. When execution finishes, and appropriate action was successfully made, either the update was run, or the message didn't match, or a retry was scheduled, or a fatal error sent to the queue, the lowest offset in the queue - 1 is committed (if it was not already committed) and the consumer is resumed. If, however, the worker crashed without taking an appropriate action, after restart it will continue from the committed offset, which would be the lowest offset of the successfully executed tasks. This will imply some double-processing, but since worker crashes are not really expected, it should be fine.