Page MenuHomePhabricator

/var/log full on deployment-elasticsearch* hosts
Closed, ResolvedPublic

Description

Is causing puppet failure, and probably also service degradation in search. Someone should clean it up and also investigate why it suddenly filled up.

Event Timeline

yuvipanda raised the priority of this task from to Unbreak Now!.
yuvipanda updated the task description. (Show Details)
yuvipanda subscribed.

apifeature usage started spamming the log, made it grow to almost 2G.

1[2015-01-27 20:58:45,951][WARN ][cluster.action.shard ] [deployment-elastic05] [apifeatureusage-2014.12.09][0] sending failed shard for [apifeatureusage-2014.12.09][0], node[bIp_JtFPQ5Wr2IsvksOYKA], [
2P], s[INITIALIZING], indexUUID [J3QlLykCTCq7XnLMbZgBgw], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[apifeatureusage-2014.12.09][0] failed to fetch index version after copy
3ing it over]; nested: IndexShardGatewayRecoveryException[[apifeatureusage-2014.12.09][0] shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFound
4Exception[no segments* file found in store(least_used[rate_limited(default(mmapfs(/var/lib/elasticsearch/beta-search/nodes/0/indices/apifeatureusage-2014.12.09/0/index),niofs(/var/lib/elasticsearch/beta-s
5earch/nodes/0/indices/apifeatureusage-2014.12.09/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
6[2015-01-27 20:58:46,172][WARN ][indices.cluster ] [deployment-elastic05] [apifeatureusage-2014.11.23][0] failed to start shard
7org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [apifeatureusage-2014.11.23][0] failed to fetch index version after copying it over
8 at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
9 at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
10 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
11 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
12 at java.lang.Thread.run(Thread.java:745)
13Caused by: org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [apifeatureusage-2014.11.23][0] shard allocated for local recovery (post api), should exist, but doesn't, current files: []
14 at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:131)
15 ... 4 more
16Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in store(least_used[rate_limited(default(mmapfs(/var/lib/elasticsearch/beta-search/nodes/0/indices/apifeatureusage-2014.11.23/0/index),niofs(/var/lib/elasticsearch/beta-search/nodes/0/indices/apifeatureusage-2014.11.23/0/index)), type=MERGE, rate=20.0)]): files: []
17 at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:870)
18 at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758)
19 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
20 at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98)
21 at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
22 ... 4 more

1root@deployment-elastic05:/var/log/elasticsearch# curl -s localhost:9200/_cat/shards | grep -v 'STARTED'
2apifeatureusage-2015.01.29 0 r UNASSIGNED
3apifeatureusage-2015.01.29 0 r UNASSIGNED
4apifeatureusage-2014.12.02 0 p UNASSIGNED
5apifeatureusage-2014.12.02 0 r UNASSIGNED
6apifeatureusage-2014.12.02 0 r UNASSIGNED
7apifeatureusage-2014.12.06 0 p INITIALIZING 10.68.16.38 deployment-elastic05
8apifeatureusage-2014.12.06 0 r UNASSIGNED
9apifeatureusage-2014.12.06 0 r UNASSIGNED
10apifeatureusage-2014.12.09 0 p UNASSIGNED
11apifeatureusage-2014.12.09 0 r UNASSIGNED
12apifeatureusage-2014.12.09 0 r UNASSIGNED
13apifeatureusage-2014.12.10 0 p INITIALIZING 10.68.17.82 deployment-elastic07
14apifeatureusage-2014.12.10 0 r UNASSIGNED
15apifeatureusage-2014.12.10 0 r UNASSIGNED
16apifeatureusage-2014.12.18 0 p INITIALIZING 10.68.17.82 deployment-elastic07
17apifeatureusage-2014.12.18 0 r UNASSIGNED
18apifeatureusage-2014.12.18 0 r UNASSIGNED
19apifeatureusage-2014.12.14 0 p UNASSIGNED
20apifeatureusage-2014.12.14 0 r UNASSIGNED
21apifeatureusage-2014.12.14 0 r UNASSIGNED
22apifeatureusage-2014.12.20 0 p UNASSIGNED
23apifeatureusage-2014.12.20 0 r UNASSIGNED
24apifeatureusage-2014.12.20 0 r UNASSIGNED
25apifeatureusage-2014.11.23 0 p INITIALIZING 10.68.17.82 deployment-elastic07
26apifeatureusage-2014.11.23 0 r UNASSIGNED
27apifeatureusage-2014.11.23 0 r UNASSIGNED
28apifeatureusage-2014.11.21 0 p UNASSIGNED
29apifeatureusage-2014.11.21 0 r UNASSIGNED
30apifeatureusage-2014.11.21 0 r UNASSIGNED
31apifeatureusage-2014.12.24 0 p INITIALIZING 10.68.16.38 deployment-elastic05
32apifeatureusage-2014.12.24 0 r UNASSIGNED
33apifeatureusage-2014.12.24 0 r UNASSIGNED
34apifeatureusage-2014.11.20 0 p UNASSIGNED
35apifeatureusage-2014.11.20 0 r UNASSIGNED
36apifeatureusage-2014.11.20 0 r UNASSIGNED

Which leads to red status, obvs.

I have cleaned a bit /var/log/elasticsearch on deployment-elastic06 and deployment-elastic07 instances at 9:00am UTC

Is it worth reimaging these hosts with the new bigger var?

hashar added a subscriber: Manybubbles.

Is it worth reimaging these hosts with the new bigger var?

Most probably :-]

demon claimed this task.

Reimaged.