As a maintainer of the search infrastructure I want the long running maintenance tasks to be resilient to node restarts so that such processes do not fail regularly.
The scroll API relies on a non persisted state maintained on the elasticsearch nodes that may disappear if the node restarts and will cause the underlying maintenance task to fail.
This problem currently affects:
- dump generation (T265056)
- title completion index rebuild
- ttmserver
- reindex? (might be solved upstream https://github.com/elastic/elasticsearch/issues/42612)
One solution is to move the state to the client performing the long running task using search_after on a stable field (the page id).
AC:
- the scroll API is no longer used by long running tasks
- a node crash does not cause a long running task to fail