tune elasticsearch recovery settings
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Gehel
	Sep 21 2017, 12:59 PM

Description

During our last cluster restart, the cluster failed to recover. See incident documentation for details.

To prevent this from happening again, we should probably reduce recovery_after_time to 5 minutes.

Details

	Subject	Repo	Branch	Lines +/-
	elasticsearch: only wait 5 minutes for all nodes in case of cold restart	operations/puppet	production	+4 -3

Customize query in gerrit

Event Timeline

Gehel created this task.Sep 21 2017, 12:59 PM

Restricted Application edited projects, added Discovery-ARCHIVED, Discovery-Search; removed Discovery-Search (Current work). · View Herald TranscriptSep 21 2017, 12:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Gehel updated the task description. (Show Details)Sep 21 2017, 1:00 PM

debt assigned this task to Gehel.Sep 21 2017, 5:04 PM

debt triaged this task as High priority.

debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.

greg moved this task from Active investigation to Follow-up prevention on the Wikimedia-Incident board.Sep 21 2017, 5:11 PM

Change 380524 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: only wait 5 minutes for all nodes in case of cold restart

https://gerrit.wikimedia.org/r/380524

gerritbot added a project: Patch-For-Review.Sep 25 2017, 3:04 PM

Gehel moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.Sep 25 2017, 3:08 PM

Change 380524 merged by Gehel:
[operations/puppet@production] elasticsearch: only wait 5 minutes for all nodes in case of cold restart

https://gerrit.wikimedia.org/r/380524

Gehel moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Sep 26 2017, 5:17 PM

debt closed this task as Resolved.Oct 2 2017, 2:11 PM

Krinkle edited projects, added Sustainability (Incident Followup); removed Wikimedia-Incident.Apr 28 2020, 9:50 PM

tune elasticsearch recovery settingsClosed, ResolvedPublicActions

Description

Details

Event Timeline

tune elasticsearch recovery settings
Closed, ResolvedPublic
Actions