The elasticsearch client responsible for building bulk requests allows to automatically flush a request based on a number of criteria. One of these criteria is the size of the request.
Sadly the estimation can be very wrong for scripted updates as only the script source is taken into consideration and not the script parameters. It caused the pipeline to fail because it tried to ship a request that was too large (>120Mb) way above the 5Mb default limits.
Possible workarounds:
- limit the size of the request even further, reducing to 1Mb the limit or possibly limiting the number of bulk requests using doc size estimations.
- using the script source (properly accounted for by the estimation) to store the data instead of the script params, this was tried in the MR !82 but it caused the Too many dynamic script compilations circuit breaker to kill the update actions. We could still tune elastic to disable this and tune the script cache size & retention to something very low but it might too invasive for a workaround.
- @pfischer suggested that we could find ways (with some adaption to the flink elasticsearch connector) to estimate the size on our side and manually flush the request when the size threshold is met.
AC:
- the pipeline can be configured is way that it does not fail because the update request is too large