[EPIC] Avoid using the elasticsearch scroll API
Open, MediumPublic
Actions

Assigned To

None

Authored By

	dcausse
	Jun 28 2021, 9:15 AM

Description

As a maintainer of the search infrastructure I want the long running maintenance tasks to be resilient to node restarts so that such processes do not fail regularly.

The scroll API relies on a non persisted state maintained on the elasticsearch nodes that may disappear if the node restarts and will cause the underlying maintenance task to fail.
This problem currently affects:

dump generation (T265056)
title completion index rebuild
ttmserver
reindex? (might be solved upstream https://github.com/elastic/elasticsearch/issues/42612)

One solution is to move the state to the client performing the long running task using search_after on a stable field (the page id).

AC:

the scroll API is no longer used by long running tasks
a node crash does not cause a long running task to fail

Related Objects
Search...

Status	Assigned	Task
Open	None	T285652 [EPIC] Avoid using the elasticsearch scroll API
Resolved	EBernhardson	T265056 Make Cirrus Search dump script more resilient to failures (elasticsearch restarts)
Open	None	T228430 Improve resiliency of the reindexing process
Open	None	T193684 Reindex should retry requests for certain error classes
Open	None	T279598 Track down "empty error" during reindexing