Page MenuHomePhabricator

The elasticsearch client does not properly estimate the size of the bulk requests
Closed, ResolvedPublic3 Estimated Story Points

Description

The elasticsearch client responsible for building bulk requests allows to automatically flush a request based on a number of criteria. One of these criteria is the size of the request.
Sadly the estimation can be very wrong for scripted updates as only the script source is taken into consideration and not the script parameters. It caused the pipeline to fail because it tried to ship a request that was too large (>120Mb) way above the 5Mb default limits.

Possible workarounds:

  • limit the size of the request even further, reducing to 1Mb the limit or possibly limiting the number of bulk requests using doc size estimations.
  • using the script source (properly accounted for by the estimation) to store the data instead of the script params, this was tried in the MR !82 but it caused the Too many dynamic script compilations circuit breaker to kill the update actions. We could still tune elastic to disable this and tune the script cache size & retention to something very low but it might too invasive for a workaround.
  • @pfischer suggested that we could find ways (with some adaption to the flink elasticsearch connector) to estimate the size on our side and manually flush the request when the size threshold is met.

AC:

  • the pipeline can be configured is way that it does not fail because the update request is too large

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.
Gehel set the point value for this task to 3.

Upstream elasticsearch bug

Upstream flink improvement request - RESOLVED, planned for release 3.1.0

Let's keep this ticket as blocked as a reminder to check for/ask for a proper release so we no longer need the patched version.