Page MenuHomePhabricator

Elasticsearch not starting on Jessie hosts
Closed, ResolvedPublic

Description

The new logstash100[4-6] servers are our first Jessie based Elasticsearch hosts. The package provided startup scripts and systemd don't seem to be getting along.

@demon ran this systemctl command:

1root@logstash1004 /var/log/elasticsearch# SYSTEMD_LOG_LEVEL=debug systemctl start elasticsearch
2Calling manager for StartUnit on elasticsearch.service, replace
3Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartUnit cookie=1 reply_cookie=0 error=n/a
4Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=GetUnit cookie=2 reply_cookie=0 error=n/a
5Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1/unit/elasticsearch_2eservice interface=org.freedesktop.DBus.Properties member=Get cookie=3 reply_cookie=0 error=n/a
6Adding /org/freedesktop/systemd1/job/6996 to the set
7Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobNew cookie=2 reply_cookie=0 error=n/a
8Got D-Bus request: org.freedesktop.systemd1.Manager.JobNew() on /org/freedesktop/systemd1
9Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=3 reply_cookie=0 error=n/a
10Got D-Bus request: org.freedesktop.systemd1.Manager.JobRemoved() on /org/freedesktop/systemd1
11Got result done/Success for job elasticsearch.service

This seems to show that systemd thinks Elasticsearch is started. ps however shows no running java processes and there is no log output in /var/log/elasticsearch.

Event Timeline

bd808 claimed this task.
bd808 raised the priority of this task from to High.
bd808 updated the task description. (Show Details)
bd808 added subscribers: RobH, bd808, Manybubbles and 2 others.

@Joe volunteered to help me fix this up. The first problem he found was that the version of Elasticsearch we got via apt-get was ancient (1.0.3+dfsg-5). We need 1.3.6 to match the existing Logstash cluster or to jump them all to a latest version (1.6.x?)

Ouch yeah that'd do it.

Come to think of it, I must've installed from upstream's apt when I did this... my experience might be less useful.

Nice!

1.5.2 is the most current version of Elasticsearch and 1.6.X is the has some compelling work to make rolling restarts faster so I've mostly been waiting on that. Yeah, we're pretty out of date, though. https://github.com/elastic/elasticsearch/issues/10032 is the issues that _should_ make rolling restarts faster.

Nice!

1.5.2 is the most current version of Elasticsearch and 1.6.X is the has some compelling work to make rolling restarts faster so I've mostly been waiting on that. Yeah, we're pretty out of date, though. https://github.com/elastic/elasticsearch/issues/10032 is the issues that _should_ make rolling restarts faster.

Wow. "seal an index" would be perfect for Logstash purposes. We create a new index each day and it gets compacted and basically made read-only by a cron job after the next day starts.

I'll probably try just taking the upstream 1.3.6 deb and see if it "just works" on jessie. If not I can take the plunge and try the latest stable. We don't use any fancy features at all for Logstash input or Kibana output so it shouldn't be too scary.

Installing http://apt.wikimedia.org/wikimedia/pool/thirdparty/e/elasticsearch/elasticsearch_1.3.6_all.deb works. I did this on logstash1004 by downloading the deb with curl and installing with dpkg.

The node isn't finding the existing cluster by multicast discovery. I'm not sure yet if this caused by a firewall problem somewhere or other misconfiguration.

Before starting elasticsearch on logstash1004 I configured the cluster to ignore the new hosts for shard allocation:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : ["10.64.0.162","10.64.16.185","10.64.48.109"]
    }
}'

I made a local modification to /etc/elasticsearch/elasticsearch.yml to enable logstash1004 to find the existing cluster:

discovery.zen.ping.unicast.hosts: 10.64.32.137

This wont survive the next puppet run but it sufficed to get the node to join the cluster. This can be set permanently via the $::elasticsearch::unicast_hosts Puppet parameter. This may be needed to allow cluster creation if the new hosts are unable to exchange multicast pings for cluster discovery.

Change 208576 had a related patch set uploaded (by BryanDavis):
logstash: Seed Elasticsearch cluster host

https://gerrit.wikimedia.org/r/208576

Change 208576 merged by Gage:
logstash: Seed Elasticsearch cluster host

https://gerrit.wikimedia.org/r/208576