08:37 <icinga-wm> PROBLEM - eventlogging-service-eventbus endpoints health on kafka2002 is CRITICAL: /v1/events (Produce a valid test event) is CRITICAL: Test Produce a valid test event returned the unexpected status 500 (expecting: 201)
Kafka seems ok, but from the eventlogging logs I can see:
Jul 26 07:05:45 kafka2002 eventlogging-service-eventbus[107460]: (MainThread) 500 POST /v1/events (10.192.16.169) 2.24ms Jul 26 07:09:44 kafka2002 eventlogging-service-eventbus[107455]: (MainThread) Unable to receive data from Kafka Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/kafka/conn.py", line 103, in _read_bytes raise socket.error("Not enough data to read message -- did server kill socket?") error: Not enough data to read message -- did server kill socket? Jul 26 07:09:44 kafka2002 eventlogging-service-eventbus[107455]: (MainThread) ConnectionError attempting to receive a response to request 4 from server BrokerMetadata(nodeId=2001, host='kafka2001.codfw.wmnet', port=9092): Kafka @ kafka2001.codfw.wmnet:9092 went away Jul 26 07:09:44 kafka2002 eventlogging-service-eventbus[107455]: (MainThread) Unable to send messages Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/kafka/producer/base.py", line 390, in _send_messages fail_on_error=self.sync_fail_on_error File "/usr/lib/python2.7/dist-packages/kafka/client.py", line 480, in send_produce_request (not fail_on_error or not self._raise_on_response_error(resp))] File "/usr/lib/python2.7/dist-packages/kafka/client.py", line 247, in _raise_on_response_error raise resp FailedPayloadsError Jul 26 07:09:44 kafka2002 eventlogging-service-eventbus[107455]: (MainThread) Uncaught exception POST /v1/events (10.192.16.169) HTTPServerRequest(protocol='http', host='10.192.16.169', method='POST', uri='/v1/events', version='HTTP/1.1', remote_ip='10.192.16.169', headers={'Content-Length': '105', 'Content-Type': 'application/json', Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1415, in _execute result = yield result File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 870, in run value = future.result() File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/srv/deployment/eventlogging/eventbus/eventlogging/service.py", line 292, in post self.application.handle_events, events File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 870, in run value = future.result() File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 505, in Task func(*args, callback=_argument_adapter(set_result), **kwargs) File "/srv/deployment/eventlogging/eventbus/eventlogging/service.py", line 192, in handle_events self.process_event(event) File "/srv/deployment/eventlogging/eventbus/eventlogging/service.py", line 175, in process_event self.send(event) File "/srv/deployment/eventlogging/eventbus/eventlogging/service.py", line 127, in send w.send(event) File "/srv/deployment/eventlogging/eventbus/eventlogging/handlers.py", line 253, in kafka_writer kafka_producer.send_messages(message_topic, message) File "/usr/lib/python2.7/dist-packages/kafka/producer/simple.py", line 54, in send_messages topic, partition, *msg File "/usr/lib/python2.7/dist-packages/kafka/producer/base.py", line 349, in send_messages return self._send_messages(topic, partition, *msg) File "/usr/lib/python2.7/dist-packages/kafka/producer/base.py", line 390, in _send_messages fail_on_error=self.sync_fail_on_error File "/usr/lib/python2.7/dist-packages/kafka/client.py", line 480, in send_produce_request (not fail_on_error or not self._raise_on_response_error(resp))] File "/usr/lib/python2.7/dist-packages/kafka/client.py", line 247, in _raise_on_response_error raise resp FailedPayloadsError Jul 26 07:09:44 kafka2002 eventlogging-service-eventbus[107455]: (MainThread) 500 POST /v1/events (10.192.16.169) 3.15ms Jul 26 07:11:41 kafka2002 eventlogging-service-eventbus[107460]: (MainThread) Writer kafka:///kafka2002.codfw.wmnet:9092,kafka2001.codfw.wmnet:9092?async=False&topic=codfw.{meta[topic]} has stopped: . Attempting to restart.