Page MenuHomePhabricator

Setup Kafka cluster, producers and consumers for logging pipeline
Closed, ResolvedPublic

Description

This task tracks setting up a Kafka cluster dedicated to logging purposes. Given the considerations in T205873: Investigate Kafka main cluster usage for logging pipeline said cluster will initially coexist on Logstash hosts now and on dedicated hardware in the future.

Outline of steps:

  • Puppetization for Kafka brokers, in a dedicated profile
  • Setup TLS certificates for the brokers
  • Deploy Kafka brokers to logstash elasticsearch data hosts
  • Decide on the granularity of topics (one single topic? one per cluster?) and the number of partitions
  • Setup Logstash input for Kafka logging (i.e. consumers) with TLS
  • Setup rsyslog output for Kafka logging (i.e. producers) with TLS and make it opt-in

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+8 -2
operations/puppetproduction+2 -2
operations/puppetproduction+4 -2
operations/puppetproduction+7 -0
operations/puppetproduction+1 -0
operations/puppetproduction+3 -0
operations/puppetproduction+4 -0
operations/puppetproduction+1 -0
operations/puppetproduction+42 -0
operations/puppetproduction+22 -1
operations/puppetproduction+20 -0
operations/puppetproduction+6 -0
operations/puppetproduction+15 -10
operations/puppetproduction+24 -11
operations/puppetproduction+9 -1
operations/puppetproduction+112 -0
operations/puppetproduction+340 -317
operations/puppetproduction+7 -0
operations/puppetproduction+2 -2
operations/puppetproduction+32 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+37 -0
operations/puppetproduction+7 -3
operations/puppetproduction+37 -0
operations/puppetproduction+40 -5
operations/puppetproduction+7 -7
operations/puppetproduction+1 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 465164 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: add ipv6 to elasticsearch

https://gerrit.wikimedia.org/r/465164

Change 465165 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: move to /srv/elasticsearch

https://gerrit.wikimedia.org/r/465165

Change 465166 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] WIP: new Kafka cluster logging-main

https://gerrit.wikimedia.org/r/465166

Change 465167 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] site: enable logging Kafka on Logstash nodes

https://gerrit.wikimedia.org/r/465167

Change 465164 abandoned by Filippo Giunchedi:
logstash: add ipv6 to elasticsearch

Reason:
See latest comment

https://gerrit.wikimedia.org/r/465164

Mentioned in SAL (#wikimedia-operations) [2018-10-16T13:55:17Z] <godog> depool logstash1007 to change elasticsearch data dir - T206454

Change 465165 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: move to /srv/elasticsearch

https://gerrit.wikimedia.org/r/465165

Mentioned in SAL (#wikimedia-operations) [2018-10-16T14:06:20Z] <godog> depool in turn logstash1008 and logstash1009 to change elasticsearch data dir - T206454

Mentioned in SAL (#wikimedia-operations) [2018-10-16T14:28:04Z] <godog> roll-restart elasticsearch on logstash100[456] to change elasticsearch data dir - T206454

Change 465166 merged by Herron:
[operations/puppet@production] New Kafka cluster logging-eqiad

https://gerrit.wikimedia.org/r/465166

Mentioned in SAL (#wikimedia-operations) [2018-10-18T16:57:48Z] <herron> enabling kafka on logstash elasticsearch cluster T206454

Change 465167 merged by Herron:
[operations/puppet@production] site: enable logging Kafka on Logstash nodes

https://gerrit.wikimedia.org/r/465167

Mentioned in SAL (#wikimedia-operations) [2018-10-18T17:19:23Z] <herron> aborted enabling kafka on logstash elasticsearch cluster due to puppet errors. reverted change T206454

Sadly puppet threw these errors after merging https://gerrit.wikimedia.org/r/465167 so reverted with https://gerrit.wikimedia.org/r/468362/

Error: /Stage[main]/Confluent::Kafka::Common/Service[confluent-kafka]/enable: change from false to mask failed: Could not set 'mask' on enable: undefined method `mask' for Service[confluent-kafka](provider=debian):Puppet::Type::Service::ProviderDebian at /etc/puppet/modules/confluent/manifests/kafka/common.pp:80

Error: /Stage[main]/Confluent::Kafka::Common/Service[confluent-kafka-connect]/enable: change from false to mask failed: Could not set 'mask' on enable: undefined method `mask' for Service[confluent-kafka-connect](provider=debian):Puppet::Type::Service::ProviderDebian at /etc/puppet/modules/confluent/manifests/kafka/common.pp:80

Error: /Stage[main]/Confluent::Kafka::Common/Service[confluent-zookeeper]/enable: change from false to mask failed: Could not set 'mask' on enable: undefined method `mask' for Service[confluent-zookeeper](provider=debian):Puppet::Type::Service::ProviderDebian at /etc/puppet/modules/confluent/manifests/kafka/common.pp:80

Wonder if this has anything to do with Jessie vs. Stretch? FWIW the logstash elasticsearch hosts are still running Jessie.

Change 468498 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] confluent::kafka::common: dont use enable => 'mask' on jessie hosts

https://gerrit.wikimedia.org/r/468498

Change 468498 merged by Herron:
[operations/puppet@production] confluent::kafka::common: force provider => 'systemd' for services

https://gerrit.wikimedia.org/r/468498

Change 468983 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] site: enable logging Kafka on Logstash nodes

https://gerrit.wikimedia.org/r/468983

Change 468983 merged by Herron:
[operations/puppet@production] site: enable logging Kafka on Logstash nodes

https://gerrit.wikimedia.org/r/468983

There were three issues observed during todays aborted deploy of logging kafka (https://gerrit.wikimedia.org/r/468498)

  • Need a proper import of confluent-kafka-2.11 1.1.0-1 for Jessie.
  • Need to address ERROR [KafkaServer id=1004] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: SSL key store is specified, but key store password is not specified.
  • Need to modify puppet structure to ensure that kafka logging role is not applied to logstash frontend hosts.

Patch has been reverted pending solutions to these.

Change 469122 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] confluent::kafka::common support thirdparty/confluent on jessie

https://gerrit.wikimedia.org/r/469122

Change 469123 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] aptrepo: add thirdparty/confluent component for jessie

https://gerrit.wikimedia.org/r/469123

Change 469124 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: set logging kafka package version to 1.1.0-1

https://gerrit.wikimedia.org/r/469124

Mentioned in SAL (#wikimedia-operations) [2018-10-23T14:47:50Z] <herron> added confluent-kafka-2.11 1.1.0-1 package to jessie-wikimedia/thirdparty T206454

Change 469122 abandoned by Herron:
confluent::kafka::common support thirdparty/confluent on jessie

Reason:
going with existing thirdparty component and relying on differences in package names (-2.11 vs -2.11.7) for separation between kafka and logstash clusters

https://gerrit.wikimedia.org/r/469122

Change 469123 abandoned by Herron:
aptrepo: add thirdparty/confluent component for jessie

Reason:
ok, will use existing thirdparty component instead

https://gerrit.wikimedia.org/r/469123

Change 469124 abandoned by Herron:
logstash: set logging kafka package version to 1.1.0-1

https://gerrit.wikimedia.org/r/469124

Change 469246 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: create es/kafka combined role and assign to es data nodes

https://gerrit.wikimedia.org/r/469246

  • Need a proper import of confluent-kafka-2.11 1.1.0-1 for Jessie.

This is done

  • Need to address ERROR [KafkaServer id=1004] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: SSL key store is specified, but key store password is not specified.

afaict to address this we'll set the password via profile::kafka::broker::ssl_password in the appropriate hieradata yaml on puppetmaster:/srv/private. However the actual location should depend on the outcome of the following item.

  • Need to modify puppet structure to ensure that kafka logging role is not applied to logstash frontend hosts.

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/469246/ is an approach to handle this and looks good with the perspective of pcc, but needs review/discussion. Please see comments in gerrit.

Change 469246 merged by Herron:
[operations/puppet@production] logstash: apply role::kafka::logging to logstash es data nodes

https://gerrit.wikimedia.org/r/469246

Kafka service is now running on the logstash elasticsearch data hosts, and related icinga service checks are green.

Change 469612 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: change burrow port for kafka logging

https://gerrit.wikimedia.org/r/469612

Change 469613 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: add Burrow metrics for kafka-logging

https://gerrit.wikimedia.org/r/469613

Change 469612 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: change burrow port for kafka logging

https://gerrit.wikimedia.org/r/469612

Change 469613 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: add Burrow metrics for kafka-logging

https://gerrit.wikimedia.org/r/469613

Change 469945 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] create rsyslog::ship_logfile - simplified logstash shipper via kafka

https://gerrit.wikimedia.org/r/469945

Change 470452 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] role::logstash::collector: migrate to profile::logstash::collector

https://gerrit.wikimedia.org/r/470452

Change 470454 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: add generic kafka input config

https://gerrit.wikimedia.org/r/470454

Change 470452 merged by Herron:
[operations/puppet@production] role::logstash::collector: migrate to profile::logstash::collector

https://gerrit.wikimedia.org/r/470452

Change 471114 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] WIP: profile::rsyslog::logstash_shipper: generalized logstash shipper

https://gerrit.wikimedia.org/r/471114

Change 471114 merged by Herron:
[operations/puppet@production] profile::rsyslog::kafka_shipper: generalized log pipeline/elk shipper

https://gerrit.wikimedia.org/r/471114

Change 472694 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] kafka_shipper: pin librdkafka1 to stretch-backports on stretch

https://gerrit.wikimedia.org/r/472694

Change 472694 abandoned by Herron:
kafka_shipper: pin librdkafka1 to stretch-backports on stretch

Reason:
ok, fair enough! I've rebuilt librdkafka from stretch-backports as librdkafka_0.11.6-1~bpo9 1 wikimedia1. Will follow up outside gerrit.

https://gerrit.wikimedia.org/r/472694

Change 473137 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] WIP: logstash::input::kafka add support for SSL/TLS options

https://gerrit.wikimedia.org/r/473137

Change 473138 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] WIP: logstash::input::kafka: add topics_prefix support

https://gerrit.wikimedia.org/r/473138

Change 473137 merged by Herron:
[operations/puppet@production] logstash::input::kafka add support for SSL/TLS options

https://gerrit.wikimedia.org/r/473137

Change 473138 abandoned by Herron:
logstash::input::kafka: add topics_pattern support

https://gerrit.wikimedia.org/r/473138

Change 473588 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash::input::kafka: add topics_pattern support

https://gerrit.wikimedia.org/r/473588

Change 473588 merged by Herron:
[operations/puppet@production] logstash::input::kafka: add topics_pattern support

https://gerrit.wikimedia.org/r/473588

Change 470454 merged by Herron:
[operations/puppet@production] logstash: add rsyslog-shipper kafka input config

https://gerrit.wikimedia.org/r/470454

Change 473607 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: add rsyslog-shipper kafka input config

https://gerrit.wikimedia.org/r/473607

Change 473607 merged by Herron:
[operations/puppet@production] logstash: add rsyslog-shipper kafka input config

https://gerrit.wikimedia.org/r/473607

Change 469945 abandoned by Herron:
create rsyslog::ship_logfile - simplified logstash shipper via kafka

Reason:
abandoning in favor of Id751995020c4d505aa917ad13a2f66f5b668078c

https://gerrit.wikimedia.org/r/469945

Change 473800 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: set rsyslog-shipper input type to syslog

https://gerrit.wikimedia.org/r/473800

Change 473800 merged by Herron:
[operations/puppet@production] logstash: set rsyslog-shipper input type to syslog

https://gerrit.wikimedia.org/r/473800

Change 473998 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] profile::rsyslog::kafka_shipper: use eqiad logging kafka brokers

https://gerrit.wikimedia.org/r/473998

Change 473998 merged by Herron:
[operations/puppet@production] profile::rsyslog::kafka_shipper: use eqiad logging kafka brokers

https://gerrit.wikimedia.org/r/473998

Change 474021 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash::input::kafka: add codec param

https://gerrit.wikimedia.org/r/474021

Change 474021 merged by Herron:
[operations/puppet@production] logstash::input::kafka: add codec param

https://gerrit.wikimedia.org/r/474021

Change 474026 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] profile::logstash::collector: set kafka shipper input codec to json

https://gerrit.wikimedia.org/r/474026

Change 474026 merged by Herron:
[operations/puppet@production] profile::logstash::collector: set kafka shipper input codec to json

https://gerrit.wikimedia.org/r/474026

Change 474317 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] kafka_shipper: use mmrm1stspace to remove leading space in msg field

https://gerrit.wikimedia.org/r/474317

Change 474319 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] kafka_shipper: update syslog json template

https://gerrit.wikimedia.org/r/474319

Change 474317 merged by Herron:
[operations/puppet@production] kafka_shipper: use mmrm1stspace to remove leading space in msg field

https://gerrit.wikimedia.org/r/474317

Change 474319 merged by Herron:
[operations/puppet@production] kafka_shipper: update syslog json template

https://gerrit.wikimedia.org/r/474319

Change 474683 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] profile: introduce jmx_exporter_port to logstash::collector

https://gerrit.wikimedia.org/r/474683

Change 474683 merged by Filippo Giunchedi:
[operations/puppet@production] profile: introduce jmx_exporter_port to logstash::collector

https://gerrit.wikimedia.org/r/474683

herron claimed this task.

Logs have been successfully shipped via the new pipeline comprised of rsyslog -> kafka -> logstash.

Transitioning this pipeline setup task to resolved and moving on to T205852 (onboarding producers to new pipeline)

Change 717311 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] rsyslog: expand output lookup table docs

https://gerrit.wikimedia.org/r/717311

Change 717311 merged by Ema:

[operations/puppet@production] rsyslog: expand output lookup table docs

https://gerrit.wikimedia.org/r/717311