These new servers are now racked and installed. They need the appropriate puppet configuration deployed to make them into a two node elasticsearch cluster. Port 80 should be made accessible from the labs network (may need a separate ticket for labs admins). Previously we put together the elasticsearch::proxy module in puppet which exposes a limited portion of elasticsearch on port 80, port 9200 should not be available from labs.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | • Deskana | T139575 EPIC: Plan to enable BM25 on fulltext search | |||
| Resolved | Gehel | T137256 Setup two node elasticsearch cluster on relforge1001-1002 | |||
| Resolved | Gehel | T141085 deploy elasticsearch/plugins to relforge1001-1002 servers | |||
| Declined | Gehel | T142098 Setup load balancing for elasticsearch service on relforge servers | |||
| Resolved | Gehel | T142211 Enable access to relforge clusters from virtual machines running on labs |
Event Timeline
Change 299865 had a related patch set uploaded (by Gehel):
WIP - configure new relevance forge servers
Change 300241 had a related patch set uploaded (by Gehel):
Adding rack information for new relforge servers
Mentioned in SAL [2016-07-21T09:47:51Z] <gehel> reinstalling and configuring relforge1001/1002 - T137256
Change 300286 had a related patch set uploaded (by Gehel):
New partition scheme for relforge (elasticsearch) servers
Change 300286 merged by Gehel:
Changed partition scheme for relforge (elasticsearch) servers
Deployment of plugins for elasticsearch is done by trebuchet, which requires connecting to redis on tin. This is not allowed by current ferm rules.
Installation done, but elasticsearch master election is failing. Firewall seems to be opened (at least port 9300 is opened between relforge1001 and 1002).
Investigating...
Log extract:
[2016-08-02 17:44:43,292][DEBUG][action.admin.cluster.state] [relforge1001] no known master node, scheduling a retry
[2016-08-02 17:45:05,237][WARN ][discovery.zen.ping.unicast] [relforge1001] failed to send ping to [{#zen_unicast_2#}{10.64.37.21}{relforge1002.eqiad.wmnet/10.64.37.21:9300}]
SendRequestTransportException[[][relforge1002.eqiad.wmnet/10.64.37.21:9300][internal:discovery/zen/unicast]]; nested: NodeNotConnectedException[[][relforge1002.eqiad.wmnet/10.64.37.21:9300] Node not connected];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:886)
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:350)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$4800(ZenDiscovery.java:91)
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: NodeNotConnectedException[[][relforge1002.eqiad.wmnet/10.64.37.21:9300] Node not connected]
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1132)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:819)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:329)
... 12 more
[2016-08-02 17:45:13,294][DEBUG][action.admin.cluster.state] [relforge1001] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2016-08-02 17:45:13,296][WARN ][rest.suppressed ] /_cluster/state/master_node Params: {metric=master_node}
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:226)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-08-02 17:45:27,096][DEBUG][action.admin.cluster.health] [relforge1001] no known master node, scheduling a retry
[2016-08-02 17:45:31,120][DEBUG][action.admin.cluster.health] [relforge1001] no known master node, scheduling a retry
[2016-08-02 17:45:43,323][DEBUG][action.admin.cluster.state] [relforge1001] no known master node, scheduling a retryElasticsearch is up and running, cluster is green. I'll keep this task open a bit as we are probably missing a few minor tweaks to make it work perfectly...