Page MenuHomePhabricator

Elasticsearch index indexing slow log generates too much data
Closed, ResolvedPublic

Description

This morning elastic1008 ran out of disk space on /, I think it's due to the index_indexing_slowlog.

Event Timeline

dcausse raised the priority of this task from to Needs Triage.
dcausse updated the task description. (Show Details)
dcausse subscribed.

I ran curl -XPUT localhost:9200/_cluster/settings/ -d '{"transient": { "index.indexing.slowlog.threshold.index.trace" : -1 } }' to disable TRACE because the line :

index.indexing.slowlog: INFO, index_indexing_slow_log_file

seems to be ignored, TRACE are still logged (bug similar to https://github.com/elastic/elasticsearch/issues/7461)
We will have to change the threshold to -1 in puppet to make the change persistent.

Change 249973 had a related patch set uploaded (by DCausse):
elasticsearch: disable indexing slow log at TRACE level

https://gerrit.wikimedia.org/r/249973

Change 249973 merged by Giuseppe Lavagetto:
elasticsearch: disable indexing slow log at TRACE level

https://gerrit.wikimedia.org/r/249973

The cluster wide setting is ineffective at runtime, I tried to set index settings with:

curl -XPUT localhost:9200/wikidatawiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'

I've set it for few indices:

curl -XPUT localhost:9200/enwiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/enwiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/dewiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/dewiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/wikidatawiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/wikidatawiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'

Seems to work, I've disabled the threshold for commonswiki, ruwiki, itwiki and frwiki indices.

thanks david. is this is a signal that things are slowing down over the long term and this is representative? i.e. two problems? :)

i'm not sure if this belongs in the todo column, or the done column? Should i adjust puppet so on next reboot all indexes have their slowlog set to a lower level?

chasemp set Security to None.

i'm not sure if this belongs in the todo column, or the done column? Should i adjust puppet so on next reboot all indexes have their slowlog set to a lower level?

good question, what is the long term idea? This dropped off my radar too.

The change in puppet has been done by Joe.
I'd like to investigate why we had slow index times during that period. Seems to be resolved now and not sure why...
We should maybe check when wikidata was reindexed to add geo field... elastic1008 is a very busy node.

The change in puppet has been done by Joe.

what was that change? link for posterity? :)

Click "Show older changes" :)

The change in puppet has been done by Joe.

what was that change? link for posterity? :)

Click "Show older changes" :)

https://gerrit.wikimedia.org/r/249973

thanks david

this is why I like 'ref' more than 'bug' :)

Deskana claimed this task.
fgiunchedi subscribed.

this morning elastic1016 / elastic1012 / elastic1026 ran out of disk space. I've moved the old logs out of the way in /var/lib/elasticsearch/logs and compressed them, the problem appears to be a very big indexing log

root@elastic1026:/var/log/elasticsearch# ls -lah *
-rw-rw-r-- 1 elasticsearch elasticsearch  12K Dec 12 09:30 elasticsearch_hot_threads.log
-rw-r--r-- 1 elasticsearch elasticsearch 1.3M Dec 12 09:24 production-search-eqiad.log
-rw-r--r-- 1 elasticsearch elasticsearch 8.1G Dec 12 09:30 production-search-eqiad_index_indexing_slowlog.log
-rw-r--r-- 1 elasticsearch elasticsearch 202M Dec 12 09:30 production-search-eqiad_index_search_slowlog.log
root@elastic1026:/var/log/elasticsearch# df -h production-search-eqiad_index_indexing_slowlog.log 
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/84dd5e63-6438-4bcb-9b9f-9b02791c1e28   28G   11G   16G  40% /
root@elastic1026:/var/log/elasticsearch#

Thanks, Joe submitted a puppet patch few month ago but we haven't restarted the nodes yet.
The workaround today is to disable the slow log on a per index basis. This time it looks like urwiki_content was the culprit, I will check it tomorrow morning disable slow log on other indices if needed.

curl -XPUT localhost:9200/urwiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'

debt subscribed.

resolving this one (was still open but in the resolved column)