This morning elastic1008 ran out of disk space on /, I think it's due to the index_indexing_slowlog.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
elasticsearch: disable indexing slow log at TRACE level | operations/puppet | production | +1 -1 |
Event Timeline
I ran curl -XPUT localhost:9200/_cluster/settings/ -d '{"transient": { "index.indexing.slowlog.threshold.index.trace" : -1 } }' to disable TRACE because the line :
index.indexing.slowlog: INFO, index_indexing_slow_log_file
seems to be ignored, TRACE are still logged (bug similar to https://github.com/elastic/elasticsearch/issues/7461)
We will have to change the threshold to -1 in puppet to make the change persistent.
Change 249973 had a related patch set uploaded (by DCausse):
elasticsearch: disable indexing slow log at TRACE level
Change 249973 merged by Giuseppe Lavagetto:
elasticsearch: disable indexing slow log at TRACE level
The cluster wide setting is ineffective at runtime, I tried to set index settings with:
curl -XPUT localhost:9200/wikidatawiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
I've set it for few indices:
curl -XPUT localhost:9200/enwiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}' curl -XPUT localhost:9200/enwiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}' curl -XPUT localhost:9200/dewiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}' curl -XPUT localhost:9200/dewiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}' curl -XPUT localhost:9200/wikidatawiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}' curl -XPUT localhost:9200/wikidatawiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
Seems to work, I've disabled the threshold for commonswiki, ruwiki, itwiki and frwiki indices.
thanks david. is this is a signal that things are slowing down over the long term and this is representative? i.e. two problems? :)
i'm not sure if this belongs in the todo column, or the done column? Should i adjust puppet so on next reboot all indexes have their slowlog set to a lower level?
The change in puppet has been done by Joe.
I'd like to investigate why we had slow index times during that period. Seems to be resolved now and not sure why...
We should maybe check when wikidata was reindexed to add geo field... elastic1008 is a very busy node.
this morning elastic1016 / elastic1012 / elastic1026 ran out of disk space. I've moved the old logs out of the way in /var/lib/elasticsearch/logs and compressed them, the problem appears to be a very big indexing log
root@elastic1026:/var/log/elasticsearch# ls -lah * -rw-rw-r-- 1 elasticsearch elasticsearch 12K Dec 12 09:30 elasticsearch_hot_threads.log -rw-r--r-- 1 elasticsearch elasticsearch 1.3M Dec 12 09:24 production-search-eqiad.log -rw-r--r-- 1 elasticsearch elasticsearch 8.1G Dec 12 09:30 production-search-eqiad_index_indexing_slowlog.log -rw-r--r-- 1 elasticsearch elasticsearch 202M Dec 12 09:30 production-search-eqiad_index_search_slowlog.log root@elastic1026:/var/log/elasticsearch# df -h production-search-eqiad_index_indexing_slowlog.log Filesystem Size Used Avail Use% Mounted on /dev/disk/by-uuid/84dd5e63-6438-4bcb-9b9f-9b02791c1e28 28G 11G 16G 40% / root@elastic1026:/var/log/elasticsearch#
Thanks, Joe submitted a puppet patch few month ago but we haven't restarted the nodes yet.
The workaround today is to disable the slow log on a per index basis. This time it looks like urwiki_content was the culprit, I will check it tomorrow morning disable slow log on other indices if needed.
curl -XPUT localhost:9200/urwiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'