Improve robustness of es-tool
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Gehel
	Mar 3 2016, 8:34 PM

Description

While doing the restart of the elasticsearch cluster, I had a few timeouts while stopping and starting replication. The change of settings was actually applied. It should be possible to catch the timeout, check if the setting change is applied and either fail gracefully or exit in success.

Details

	Subject	Repo	Branch	Lines +/-
	Improve robustness of es-tool	operations/puppet	production	+28 -20

Customize query in gerrit

Event Timeline

Gehel created this task.Mar 3 2016, 8:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 3 2016, 8:34 PM

Gehel moved this task from Needs triage to Ops on the Discovery-ARCHIVED board.Mar 3 2016, 8:40 PM

I think we should increase the timeout, rather then trying to catch the failure (although perhaps both?). Elasticsearch has a default 30s master timeout when not specified, it looks like the http library es-tool was using timed out after 10s though. Ideally we should put these two timeouts in lockstep.

The master timeout for elasticsearch is not configurable as a cluster wide setting. Instead individual actions need to provide a master_timeout=2m query string parameter. Within cirrussearch we have adjusted a few of the calls that commonly timeout to provide a 2m (2 minute) timeout.

Change 282472 had a related patch set uploaded (by Adedommelin):
Improve robustness of es-tool

https://gerrit.wikimedia.org/r/282472

Restricted Application added a project: Discovery-Search. · View Herald TranscriptApr 9 2016, 1:27 PM

gerritbot added a project: Patch-For-Review.Apr 9 2016, 1:27 PM

adedommelin subscribed.Apr 9 2016, 4:09 PM

Gehel reassigned this task from Gehel to Nicko.Apr 13 2016, 1:19 PM

Change 282472 merged by Gehel:
Improve robustness of es-tool

https://gerrit.wikimedia.org/r/282472

Gehel added a project: Discovery-Search (Current work).Apr 28 2016, 11:11 AM

Gehel moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

removing from discovery backlog as it is already implemented

• Deskana closed this task as Resolved.May 11 2016, 10:39 PM