Elasticsearch not starting on Jessie hosts
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	bd808
	Apr 30 2015, 2:05 PM

Description

The new logstash100[4-6] servers are our first Jessie based Elasticsearch hosts. The package provided startup scripts and systemd don't seem to be getting along.

@demon ran this systemctl command:

P575 Starting elasticsearch with log level debug

1	root@logstash1004 /var/log/elasticsearch# SYSTEMD_LOG_LEVEL=debug systemctl start elasticsearch
2	Calling manager for StartUnit on elasticsearch.service, replace
3	Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartUnit cookie=1 reply_cookie=0 error=n/a
4	Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=GetUnit cookie=2 reply_cookie=0 error=n/a
5	Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1/unit/elasticsearch_2eservice interface=org.freedesktop.DBus.Properties member=Get cookie=3 reply_cookie=0 error=n/a
6	Adding /org/freedesktop/systemd1/job/6996 to the set
7	Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobNew cookie=2 reply_cookie=0 error=n/a
8	Got D-Bus request: org.freedesktop.systemd1.Manager.JobNew() on /org/freedesktop/systemd1
9	Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=3 reply_cookie=0 error=n/a
10	Got D-Bus request: org.freedesktop.systemd1.Manager.JobRemoved() on /org/freedesktop/systemd1
11	Got result done/Success for job elasticsearch.service

This seems to show that systemd thinks Elasticsearch is started. ps however shows no running java processes and there is no log output in /var/log/elasticsearch.

Details

	Subject	Repo	Branch	Lines +/-
	logstash: Seed Elasticsearch cluster host	operations/puppet	production	+8 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	bd808	T69817 Monitor for anomalies/spikes in read failures of memcached
Resolved	bd808	T100735 Have Logstash report per-channel log message rate to Graphite
Resolved	bd808	T99735 Upgrade Logstash to 1.5.3
Resolved	bd808	T97545 reinstall logstash1001-1003
Resolved	Anomie	T1272 Deploy ApiFeatureUsage extension on WMF wikis
Resolved	bd808	T87521 Convert JobRunner.php to PSR-3 logging and add levels
Resolved	bd808	T88732 Decouple logging infrastructure failures from MediaWiki logging
Resolved	bd808	T96692 Rack and Setup (3) Logstash Servers
Resolved	bd808	T97645 Elasticsearch not starting on Jessie hosts

Event Timeline

bd808 created this task.Apr 30 2015, 2:05 PM

bd808 claimed this task.

bd808 raised the priority of this task from to High.

bd808 updated the task description. (Show Details)

bd808 added projects: Wikimedia-Logstash, acl*sre-team.

bd808 added subscribers: RobH, bd808, • Manybubbles and 2 others.

Fun!

@Joe volunteered to help me fix this up. The first problem he found was that the version of Elasticsearch we got via apt-get was ancient (1.0.3+dfsg-5). We need 1.3.6 to match the existing Logstash cluster or to jump them all to a latest version (1.6.x?)

Ouch yeah that'd do it.

Come to think of it, I must've installed from upstream's apt when I did this... my experience might be less useful.

Nice!

1.5.2 is the most current version of Elasticsearch and 1.6.X is the has some compelling work to make rolling restarts faster so I've mostly been waiting on that. Yeah, we're pretty out of date, though. https://github.com/elastic/elasticsearch/issues/10032 is the issues that _should_ make rolling restarts faster.

In T97645#1250739, @Manybubbles wrote:

Nice!

1.5.2 is the most current version of Elasticsearch and 1.6.X is the has some compelling work to make rolling restarts faster so I've mostly been waiting on that. Yeah, we're pretty out of date, though. https://github.com/elastic/elasticsearch/issues/10032 is the issues that _should_ make rolling restarts faster.

Wow. "seal an index" would be perfect for Logstash purposes. We create a new index each day and it gets compacted and basically made read-only by a cron job after the next day starts.

I'll probably try just taking the upstream 1.3.6 deb and see if it "just works" on jessie. If not I can take the plunge and try the latest stable. We don't use any fancy features at all for Logstash input or Kibana output so it shouldn't be too scary.

Installing http://apt.wikimedia.org/wikimedia/pool/thirdparty/e/elasticsearch/elasticsearch_1.3.6_all.deb works. I did this on logstash1004 by downloading the deb with curl and installing with dpkg.

The node isn't finding the existing cluster by multicast discovery. I'm not sure yet if this caused by a firewall problem somewhere or other misconfiguration.

Before starting elasticsearch on logstash1004 I configured the cluster to ignore the new hosts for shard allocation:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : ["10.64.0.162","10.64.16.185","10.64.48.109"]
    }
}'

I made a local modification to /etc/elasticsearch/elasticsearch.yml to enable logstash1004 to find the existing cluster:

discovery.zen.ping.unicast.hosts: 10.64.32.137

This wont survive the next puppet run but it sufficed to get the node to join the cluster. This can be set permanently via the $::elasticsearch::unicast_hosts Puppet parameter. This may be needed to allow cluster creation if the new hosts are unable to exchange multicast pings for cluster discovery.

Change 208576 had a related patch set uploaded (by BryanDavis):
logstash: Seed Elasticsearch cluster host

https://gerrit.wikimedia.org/r/208576

gerritbot added a project: Patch-For-Review.May 4 2015, 12:15 AM

Change 208576 merged by Gage:
logstash: Seed Elasticsearch cluster host

https://gerrit.wikimedia.org/r/208576

• Gage mentioned this in rOPUPebaf2ab7cc42: logstash: Seed Elasticsearch cluster host.May 4 2015, 4:40 PM

bd808 closed this task as Resolved.May 4 2015, 6:01 PM

bd808 mentioned this in T98042: Update Wikimedia apt repo to include debs for Elasticsearch & Logstash on jessie.

bd808 added a project: User-bd808.

bd808 moved this task from To Do to Done on the User-bd808 board.

bd808 moved this task from Done to Archive on the User-bd808 board.May 12 2015, 6:23 AM

bd808 mentioned this in T98620: Degraded RAID-1 arrays on new logstash hosts: [UU__].May 13 2015, 5:13 PM