I see some logs from crons on logstash servers indicating failing index optimization (see below for example).
A correction was already made post elasticsearch 2.3 upgrade.
Looking at crons on logstash1001, I see multiple crons that seem similar. This makes me think that a rename of the cron in puppet happened, without cleanup of the previous cron.
gehel@logstash1001:~$ sudo crontab -l | grep optimize # Puppet Name: apifeatureusage_optimize_index 40 1 * * * /usr/local/bin/logstash_optimize_index.sh http://10.2.2.30:9200 "apifeatureusage-$(date -d '-1day' +\%Y.\%m.\%d)" # Puppet Name: logstash_optimize_index 10 1 * * * /usr/local/bin/logstash_optimize_index.sh http://127.0.0.1:9200 # Puppet Name: logstash_optimize_index_apifeatureusage 40 1 * * * /usr/local/bin/logstash_optimize_index.sh 10.2.2.30:9200 "apifeatureusage-$(date -d '-1days' +\%Y.\%m.\%d)" # Puppet Name: logstash_optimize_index_logstash 20 1 * * * /usr/local/bin/logstash_optimize_index.sh 127.0.0.1:9200 "logstash-$(date -d '-1days' +\%Y.\%m.\%d)"
Those crons seem to be cluster wide (index optimization is done for the cluster, not for individual node) but they are deployed on all nodes. This should not cause any issue (multiple calls to optimize will block waiting for the previous one to complete), but is probably not necessary.
extract of email received from cron:
Subject: Cron <root@logstash1001> /usr/local/bin/logstash_optimize_index.sh 127.0.0.1:9200 "logstash-$(date -d '-1days' +%Y.%m.%d)"
To: root@logstash1001.eqiad.wmnet
Failed to optimize 127.0.0.1:9200/logstash-2016.07.19
{
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"indices" : {
"logstash-2016.07.19" : {
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "4UXNV6w_TPySDt73KkT-mQ"
},
"num_committed_segments" : 44,
"num_search_segments" : 44,
"segments" : {
[...]rom: Cron Daemon <root@logstash1002.eqiad.wmnet>
Date: Thu, Jul 21, 2016 at 9:58 AM
Subject: Cron <root@logstash1002> /usr/local/bin/logstash_optimize_index.sh http://127.0.0.1:9200
To: root@logstash1002.eqiad.wmnet
Failed to force a synced flush for http://127.0.0.1:9200/logstash-2016.07.20
HTTP/1.1 409 Conflict
Content-Type: application/json; charset=UTF-8
Content-Length: 379
{"_shards":{"total":3,"successful":2,"failed":1},"logstash-2016.07.20":{"total":3,"successful":2,"failed":1,"failures":[{"shard":0,"reason":"no commit id from pre-sync flush","routing":{"state":"STARTED","primary":false,"node":"lYF1WLrRRgmRjTGUViqzZA","relocating_node":null,"shard":0,"index":"logstash-2016.07.20","version":4,"allocation_id":{"id":"27FdfhXBR42ftwXUI0NuKQ"}}}]}}