Elasticsearch index indexing slow log generates too much data
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	dcausse
	Oct 30 2015, 8:57 AM

Description

This morning elastic1008 ran out of disk space on /, I think it's due to the index_indexing_slowlog.

Details

	Subject	Repo	Branch	Lines +/-
	elasticsearch: disable indexing slow log at TRACE level	operations/puppet	production	+1 -1

Customize query in gerrit

Event Timeline

dcausse created this task.Oct 30 2015, 8:57 AM

dcausse raised the priority of this task from to Needs Triage.

dcausse updated the task description. (Show Details)

dcausse added a project: Discovery-Search (Current work).

dcausse subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 30 2015, 8:57 AM

I ran curl -XPUT localhost:9200/_cluster/settings/ -d '{"transient": { "index.indexing.slowlog.threshold.index.trace" : -1 } }' to disable TRACE because the line :

index.indexing.slowlog: INFO, index_indexing_slow_log_file

seems to be ignored, TRACE are still logged (bug similar to https://github.com/elastic/elasticsearch/issues/7461)
We will have to change the threshold to -1 in puppet to make the change persistent.

Change 249973 had a related patch set uploaded (by DCausse):
elasticsearch: disable indexing slow log at TRACE level

https://gerrit.wikimedia.org/r/249973

gerritbot added a project: Patch-For-Review.Oct 30 2015, 9:44 AM

Change 249973 merged by Giuseppe Lavagetto:
elasticsearch: disable indexing slow log at TRACE level

https://gerrit.wikimedia.org/r/249973

The cluster wide setting is ineffective at runtime, I tried to set index settings with:

curl -XPUT localhost:9200/wikidatawiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'

I've set it for few indices:

curl -XPUT localhost:9200/enwiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/enwiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/dewiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/dewiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/wikidatawiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'
curl -XPUT localhost:9200/wikidatawiki_content/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'

Seems to work, I've disabled the threshold for commonswiki, ruwiki, itwiki and frwiki indices.

thanks david. is this is a signal that things are slowing down over the long term and this is representative? i.e. two problems? :)

i'm not sure if this belongs in the todo column, or the done column? Should i adjust puppet so on next reboot all indexes have their slowlog set to a lower level?

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 3 2015, 5:23 PM

In T117181#1777922, @EBernhardson wrote:

i'm not sure if this belongs in the todo column, or the done column? Should i adjust puppet so on next reboot all indexes have their slowlog set to a lower level?

good question, what is the long term idea? This dropped off my radar too.

The change in puppet has been done by Joe.
I'd like to investigate why we had slow index times during that period. Seems to be resolved now and not sure why...
We should maybe check when wikidata was reindexed to add geo field... elastic1008 is a very busy node.

In T117181#1777982, @dcausse wrote:

The change in puppet has been done by Joe.

what was that change? link for posterity? :)

Click "Show older changes" :)

In T117181#1777995, @chasemp wrote:

In T117181#1777982, @dcausse wrote:

The change in puppet has been done by Joe.

what was that change? link for posterity? :)

In T117181#1777996, @dcausse wrote:

Click "Show older changes" :)

https://gerrit.wikimedia.org/r/249973

thanks david

this is why I like 'ref' more than 'bug' :)

EBernhardson moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.Nov 4 2015, 2:31 PM

• Deskana closed this task as Resolved.Dec 4 2015, 5:18 AM

• Deskana claimed this task.

• Deskana moved this task from Needs Reporting to Resolved on the Discovery-Search (Current work) board.

this morning elastic1016 / elastic1012 / elastic1026 ran out of disk space. I've moved the old logs out of the way in /var/lib/elasticsearch/logs and compressed them, the problem appears to be a very big indexing log

root@elastic1026:/var/log/elasticsearch# ls -lah *
-rw-rw-r-- 1 elasticsearch elasticsearch  12K Dec 12 09:30 elasticsearch_hot_threads.log
-rw-r--r-- 1 elasticsearch elasticsearch 1.3M Dec 12 09:24 production-search-eqiad.log
-rw-r--r-- 1 elasticsearch elasticsearch 8.1G Dec 12 09:30 production-search-eqiad_index_indexing_slowlog.log
-rw-r--r-- 1 elasticsearch elasticsearch 202M Dec 12 09:30 production-search-eqiad_index_search_slowlog.log
root@elastic1026:/var/log/elasticsearch# df -h production-search-eqiad_index_indexing_slowlog.log 
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/84dd5e63-6438-4bcb-9b9f-9b02791c1e28   28G   11G   16G  40% /
root@elastic1026:/var/log/elasticsearch#

fgiunchedi added a project: SRE.Dec 12 2015, 9:31 AM

Thanks, Joe submitted a puppet patch few month ago but we haven't restarted the nodes yet.
The workaround today is to disable the slow log on a per index basis. This time it looks like urwiki_content was the culprit, I will check it tomorrow morning disable slow log on other indices if needed.

curl -XPUT localhost:9200/urwiki_general/_settings -d '{"index.indexing.slowlog.threshold.index.trace": -1}'

resolving this one (was still open but in the resolved column)

Elasticsearch index indexing slow log generates too much dataClosed, ResolvedPublicActions

Description

Details

Event Timeline

Elasticsearch index indexing slow log generates too much data
Closed, ResolvedPublic
Actions