Is causing puppet failure, and probably also service degradation in search. Someone should clean it up and also investigate why it suddenly filled up.
Description
Description
Related Objects
Related Objects
Event Timeline
yuvipanda subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 2 2015, 12:46 PM2015-02-02 12:46:41 (UTC+0)
Comment Actions
apifeature usage started spamming the log, made it grow to almost 2G.
1 | [2015-01-27 20:58:45,951][WARN ][cluster.action.shard ] [deployment-elastic05] [apifeatureusage-2014.12.09][0] sending failed shard for [apifeatureusage-2014.12.09][0], node[bIp_JtFPQ5Wr2IsvksOYKA], [ |
---|---|
2 | P], s[INITIALIZING], indexUUID [J3QlLykCTCq7XnLMbZgBgw], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[apifeatureusage-2014.12.09][0] failed to fetch index version after copy |
3 | ing it over]; nested: IndexShardGatewayRecoveryException[[apifeatureusage-2014.12.09][0] shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFound |
4 | Exception[no segments* file found in store(least_used[rate_limited(default(mmapfs(/var/lib/elasticsearch/beta-search/nodes/0/indices/apifeatureusage-2014.12.09/0/index),niofs(/var/lib/elasticsearch/beta-s |
5 | earch/nodes/0/indices/apifeatureusage-2014.12.09/0/index)), type=MERGE, rate=20.0)]): files: []]; ]] |
6 | [2015-01-27 20:58:46,172][WARN ][indices.cluster ] [deployment-elastic05] [apifeatureusage-2014.11.23][0] failed to start shard |
7 | org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [apifeatureusage-2014.11.23][0] failed to fetch index version after copying it over |
8 | at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152) |
9 | at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) |
10 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) |
11 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) |
12 | at java.lang.Thread.run(Thread.java:745) |
13 | Caused by: org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [apifeatureusage-2014.11.23][0] shard allocated for local recovery (post api), should exist, but doesn't, current files: [] |
14 | at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:131) |
15 | ... 4 more |
16 | Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in store(least_used[rate_limited(default(mmapfs(/var/lib/elasticsearch/beta-search/nodes/0/indices/apifeatureusage-2014.11.23/0/index),niofs(/var/lib/elasticsearch/beta-search/nodes/0/indices/apifeatureusage-2014.11.23/0/index)), type=MERGE, rate=20.0)]): files: [] |
17 | at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:870) |
18 | at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) |
19 | at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) |
20 | at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) |
21 | at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122) |
22 | ... 4 more |
Comment Actions
1 | root@deployment-elastic05:/var/log/elasticsearch# curl -s localhost:9200/_cat/shards | grep -v 'STARTED' |
---|---|
2 | apifeatureusage-2015.01.29 0 r UNASSIGNED |
3 | apifeatureusage-2015.01.29 0 r UNASSIGNED |
4 | apifeatureusage-2014.12.02 0 p UNASSIGNED |
5 | apifeatureusage-2014.12.02 0 r UNASSIGNED |
6 | apifeatureusage-2014.12.02 0 r UNASSIGNED |
7 | apifeatureusage-2014.12.06 0 p INITIALIZING 10.68.16.38 deployment-elastic05 |
8 | apifeatureusage-2014.12.06 0 r UNASSIGNED |
9 | apifeatureusage-2014.12.06 0 r UNASSIGNED |
10 | apifeatureusage-2014.12.09 0 p UNASSIGNED |
11 | apifeatureusage-2014.12.09 0 r UNASSIGNED |
12 | apifeatureusage-2014.12.09 0 r UNASSIGNED |
13 | apifeatureusage-2014.12.10 0 p INITIALIZING 10.68.17.82 deployment-elastic07 |
14 | apifeatureusage-2014.12.10 0 r UNASSIGNED |
15 | apifeatureusage-2014.12.10 0 r UNASSIGNED |
16 | apifeatureusage-2014.12.18 0 p INITIALIZING 10.68.17.82 deployment-elastic07 |
17 | apifeatureusage-2014.12.18 0 r UNASSIGNED |
18 | apifeatureusage-2014.12.18 0 r UNASSIGNED |
19 | apifeatureusage-2014.12.14 0 p UNASSIGNED |
20 | apifeatureusage-2014.12.14 0 r UNASSIGNED |
21 | apifeatureusage-2014.12.14 0 r UNASSIGNED |
22 | apifeatureusage-2014.12.20 0 p UNASSIGNED |
23 | apifeatureusage-2014.12.20 0 r UNASSIGNED |
24 | apifeatureusage-2014.12.20 0 r UNASSIGNED |
25 | apifeatureusage-2014.11.23 0 p INITIALIZING 10.68.17.82 deployment-elastic07 |
26 | apifeatureusage-2014.11.23 0 r UNASSIGNED |
27 | apifeatureusage-2014.11.23 0 r UNASSIGNED |
28 | apifeatureusage-2014.11.21 0 p UNASSIGNED |
29 | apifeatureusage-2014.11.21 0 r UNASSIGNED |
30 | apifeatureusage-2014.11.21 0 r UNASSIGNED |
31 | apifeatureusage-2014.12.24 0 p INITIALIZING 10.68.16.38 deployment-elastic05 |
32 | apifeatureusage-2014.12.24 0 r UNASSIGNED |
33 | apifeatureusage-2014.12.24 0 r UNASSIGNED |
34 | apifeatureusage-2014.11.20 0 p UNASSIGNED |
35 | apifeatureusage-2014.11.20 0 r UNASSIGNED |
36 | apifeatureusage-2014.11.20 0 r UNASSIGNED |
Which leads to red status, obvs.
Comment Actions
I have cleaned a bit /var/log/elasticsearch on deployment-elastic06 and deployment-elastic07 instances at 9:00am UTC
Comment Actions
In T88280#1016111, @greg wrote:Is it worth reimaging these hosts with the new bigger var?
Most probably :-]
greg moved this task from To Triage to Done on the Beta-Cluster-Infrastructure board.Mar 5 2015, 4:45 PM2015-03-05 16:45:54 (UTC+0)
• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:55 PM2017-06-07 18:55:15 (UTC+0)
Restricted Application added projects: Discovery-ARCHIVED, Discovery-Search, Release-Engineering-Team (Kanban). · View Herald TranscriptJun 7 2017, 6:55 PM2017-06-07 18:55:15 (UTC+0)
• Phabricator_maintenance edited projects, added RelEng-Archive-FY201718-Q1; removed Release-Engineering-Team (Kanban).Sep 26 2017, 11:48 PM2017-09-26 23:48:12 (UTC+0)