Page MenuHomePhabricator

Understand Kafka ACLs and figure out what ACLs we want for production topics
Closed, ResolvedPublic8 Estimated Story Points

Event Timeline

Ottomata created this task.Jun 7 2017, 2:42 PM
Nuria moved this task from Incoming to Dashiki on the Analytics board.Jun 12 2017, 3:30 PM
elukey moved this task from Backlog to In Progress on the User-Elukey board.Jun 19 2017, 10:08 AM
Nuria edited projects, added Analytics-Kanban; removed Analytics.Jul 6 2017, 4:50 PM
Nuria added a comment.Jul 6 2017, 4:53 PM

Figuring out out the configuration (access control list) for teh clients, without stablishing any policies. Mostly about testing in an appropriate kafka environment.

Nuria set the point value for this task to 8.Jul 6 2017, 4:54 PM
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.Jul 19 2017, 3:02 PM
elukey moved this task from In Progress to Next Up on the Analytics-Kanban board.

So judging from https://issues.apache.org/jira/browse/KAFKA-3532 it seems that the following sentence in the docs implies that we'd need to implement a custom class to parse the TLS user name if we want to use shorter names:

By default, the TLS user name will be of the form “CN=host1.example.com,OU=,O=Confluent,L=London,ST=London,C=GB”. One can change that by setting a customized PrincipalBuilder in server.properties like the following:

principal.builder.class=CustomizedPrincipalBuilderClass

https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/

After a lot of tests here's the list of the most important things that I found:

  • The authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer is the only auth scheme validator available (shipped with kafka). The username retrieved from a TLS client certificate is something like CN=client1,OU=Services,O=WMF,C=US, so every ACL must list the complete string.
  • The PLAINTEXT listener (not requiring TLS auth or encryption) works fine alongside the SSL one, but it will cause a producer or a consumer to log in as ANONYMOUS. This means that to avoid ACL block actions the ANONYMOUS user needs to have ACLs.
  • The kafka acls command already offers the --consumer and --producer options to automatically add the minimum set of credentials for a consumer/producer client (related to a topic).
  • The * wildcard can be used in the kafka acls command only as all, so not in any regex like meaning.
  • Zookeeper is not authenticated, so the kafka acls command can add/delete/etc.. ACLs without any auth too. This is fine in our current setup thanks to the ferm rules, but it must be kept in mind.
  • There are two sets of ACLs: one for clients accessing topics, the other one for brokers changing topics metadata. The ACLs for the latter can be added using the --cluster option. This is essential to make everything work otherwise brokers will not be able to do anything to a topic.
  • When adding a ACL with kafka acls the full zookeeper path needs to be specified like --zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics, otherwise only specifying the zookeper host name will not work as expected (ACLs not respected).

Nice! We'll move all this stuff to a wiki page one day. Thanks luca!

full zookeeper path needs to be specified

This is true of all kafka commands that need a zookeeper arg. Our kafka wrapper .sh script expects $ZOOKEEPER_URL to be set to this. If it isn't being passed to kafka acls already, then we should fix the script to do that properly.

Change 368199 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] confluent::kafka.sh: fix kafka-acls command autocompletion

https://gerrit.wikimedia.org/r/368199

Change 368199 merged by Elukey:
[operations/puppet@production] confluent::kafka.sh: fix kafka-acls command autocompletion

https://gerrit.wikimedia.org/r/368199

elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.Jul 31 2017, 9:07 AM

This is the result of adding ACLs for user test to produce/consume to the elukey2 topic with built-in cli tools:

elukey@kafka3-1:~$ kafka acls --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --add --producer --allow-principal 'User:test' --topic elukey2
kafka-acls --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --add --producer --allow-principal User:test --topic elukey2
Adding ACLs for resource `Topic:elukey2`:
 	User:test has Allow permission for operations: Write from hosts: *
	User:test has Allow permission for operations: Describe from hosts: *

Adding ACLs for resource `Cluster:kafka-cluster`:
 	User:test has Allow permission for operations: Create from hosts: *

elukey@kafka3-1:~$ kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership2-analytics --allow-principal "User:test" --topic elukey --consumer --add --group=*
kafka-acls --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership2-analytics --allow-principal User:test --topic elukey --consumer --add --group=*
Adding ACLs for resource `Topic:elukey`:
 	User:test has Allow permission for operations: Read from hosts: *
	User:test has Allow permission for operations: Describe from hosts: *

Adding ACLs for resource `Group:*`:
 	User:test has Allow permission for operations: Read from hosts: *

Using these pre-canned commands we don't need to list all the ACLs explicitly. Couple of notes:

  1. We are assuming that each producer (for example Varnishkafka) will have a specific certificate deployed to all the hosts that needs to use it (using unix perms to allow only selected daemons to read/use it). For example, Varnishkafka webrequest upload will have a client TLS certificate and it will be deployed only to the cache upload hosts, with proper permissions to allow only the varnishkafka daemon to read it. Same thing for the consumers.
  1. By default there are no restrictions on the client host that will issue a produce/consume request. I think this is a good compromise between flexibility and security, since if we follow the auth scheme outlined in 1) it is a bit of a hassle in my opinion to keep Kafka ACLs also up to date with IP changes etc..

As explained before we also need to explicitly set ACLs for cluster operations between brokers, and again I'd use the standard ones provided by kafka acls:

elukey@kafka3-1:~$ kafka acls --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --cluster --add --allow-principal 'User:CN=kafka3-1,OU=Services,O=WMF,C=US'

kafka-acls --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --cluster --add --allow-principal User:CN=kafka3-1,OU=Services,O=WMF,C=US
Adding ACLs for resource `Cluster:kafka-cluster`:
 	User:CN=kafka3-1,OU=Services,O=WMF,C=US has Allow permission for operations: All from hosts: *

In this labs example I whitelisted the broker kafka3-1 for all the available operations. These rules are more static so we could think about restricting the host source IPs.

ema added a subscriber: ema.Aug 2 2017, 8:57 AM
elukey added a comment.Aug 3 2017, 1:22 PM

After checking the kafka-authorizer.log file I had to add the following rules to avoid deny errors:

kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --allow-principal "User:CN=kafka3-3,OU=Services,O=WMF,C=US" --topic __confluent.support.metrics --operation Read --operation Describe --add
kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --allow-principal "User:CN=kafka3-2,OU=Services,O=WMF,C=US" --topic __confluent.support.metrics --operation Read --operation Describe --add
kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --allow-principal "User:CN=kafka3-3,OU=Services,O=WMF,C=US" --topic __confluent.support.metrics --operation Read --operation Describe --add


kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --allow-principal "User:CN=kafka3-3,OU=Services,O=WMF,C=US" --topic __consumer_offsets --operation Read --operation Describe --add
kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --allow-principal "User:CN=kafka3-2,OU=Services,O=WMF,C=US" --topic __consumer_offsets --operation Read --operation Describe --add
kafka acls  --authorizer-properties zookeeper.connect=zk1-1.analytics.eqiad.wmflabs:2181/kafka/mothership3-analytics --allow-principal "User:CN=kafka3-3,OU=Services,O=WMF,C=US" --topic __consumer_offsets --operation Read --operation Describe --add

I didn't expect that to be needed but apparently the brokers can act as "consumers" for the internal topics that kafka maintains for offsets and metrics.

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Aug 3 2017, 1:23 PM
elukey moved this task from In Progress to Done on the User-Elukey board.Aug 4 2017, 8:16 AM
Nuria closed this task as Resolved.Aug 8 2017, 7:54 PM

As explained before we also need to explicitly set ACLs for cluster operations between brokers

I was able to get around this by settings super.users in server.properties. to the DN of the certificate we use for broker authentication.

@elukey, SSL and auth enabled, and log4j.logger.kafka.authorizer.logger=DEBUG, I get insanely verbose kafka-authorizer.log, especially with mirror maker running. In my labs test at the moment, with 2 brokers and 1 mirror maker consuming from this cluster, I'm seeing around 200 lines per second being written to kafka-authorizer.log.

As explained before we also need to explicitly set ACLs for cluster operations between brokers

I was able to get around this by settings super.users in server.properties. to the DN of the certificate we use for broker authentication.

If this is the standard practice, then it is fine to me, but I thought that it was better to set the permissions to the minimum needed for all the users/brokers rather than using super.users. It is true that the brokers have already all the power they need to make any damage, but for some reason the idea of using super.user for the brokers does not convince me too much. Anyhow, this might be a paranoid thought, if you feel strongly about it please proceed.

@elukey, SSL and auth enabled, and log4j.logger.kafka.authorizer.logger=DEBUG, I get insanely verbose kafka-authorizer.log, especially with mirror maker running. In my labs test at the moment, with 2 brokers and 1 mirror maker consuming from this cluster, I'm seeing around 200 lines per second being written to kafka-authorizer.log.

Not sure why they have set up the logging in this way, it is really frustrating that DEBUG is the only one that logs a decent client/access log but it is so verbose. Feel free to raise the verbosity of the logs, even if debugging issues in production when we port clients over might become cumbersome (at least from my experience while playing with it in labs). Maybe there is a way to set log4j to filter only what we need and remove the garbage from kafka-authorizer?

Change 394438 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Improvements for Kafka SSL

https://gerrit.wikimedia.org/r/394438

Change 394438 merged by Ottomata:
[operations/puppet@production] Improvements for Kafka SSL

https://gerrit.wikimedia.org/r/394438

Change 395568 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use super.users instead of kafka-acls exec to authenticate broker principals

https://gerrit.wikimedia.org/r/395568

Change 395568 merged by Ottomata:
[operations/puppet@production] Use super.users instead of kafka-acls exec to authenticate broker principals

https://gerrit.wikimedia.org/r/395568

Change 395586 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Grant Create (topic) and Describe on kafka cluster resource for User:Anonymous

https://gerrit.wikimedia.org/r/395586

Change 395586 merged by Ottomata:
[operations/puppet@production] Grant Create (topic) and Describe on kafka cluster resource for User:Anonymous

https://gerrit.wikimedia.org/r/395586

Change 399700 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set ssl.cipher.suites and ssl.enabled.protocols for Kafka jumbo and varnishkafka (canary)

https://gerrit.wikimedia.org/r/399700

I also just played with ACLs to see what a good combo was to restrict production to webrequest to varnishkafka principal, but still allow reads. The following seemed to work:

Current ACLs for resource `Group:*`:
 	User:ANONYMOUS has Allow permission for operations: Read from hosts: *

Current ACLs for resource `Topic:*`:
 	User:ANONYMOUS has Allow permission for operations: Describe from hosts: *
	User:ANONYMOUS has Allow permission for operations: Read from hosts: *

Current ACLs for resource `Topic:webrequest`:
 	User:CN=varnishkafka_test has Allow permission for operations: Write from hosts: *
	User:CN=varnishkafka_test has Allow permission for operations: Describe from hosts: *

Current ACLs for resource `Cluster:kafka-cluster`:
 	User:ANONYMOUS has Allow permission for operations: Describe from hosts: *
	User:CN=varnishkafka_test has Allow permission for operations: Create from hosts: *
	User:ANONYMOUS has Allow permission for operations: Create from hosts: *

(User:CN=varnishkafka_test is my varnishkafka test ssl principal)

I got to this by running these kafka-acl commands:

kafka-acls --authorizer-properties zookeeper.connect=$KAFKA_ZOOKEEPER_URL --add --allow-principal User:ANONYMOUS --consumer --group '*' --topic '*'
kafka-acls --authorizer-properties zookeeper.connect=$KAFKA_ZOOKEEPER_URL --add --allow-principal User:CN=varnishkafka_test --producer --topic webrequest

Today I also found that we needed

kafka acls --add --deny-principal User:ANONYMOUS --operation Write --topic webrequest_canary_test

To keep ANONYMOUS from producing to the webrequest topic

and

kafka acls --add  --allow-principal User:ANONYMOUS --cluster --operation DescribeConfigs

To get ANONYMOUS consumers able to read the cluster metadata.

kafka acls --list on jumbo currently says:

Current ACLs for resource `Group:*`:
 	User:ANONYMOUS has Allow permission for operations: Read from hosts: *

Current ACLs for resource `Topic:*`:
 	User:ANONYMOUS has Allow permission for operations: Describe from hosts: *
	User:ANONYMOUS has Allow permission for operations: Read from hosts: *
	User:ANONYMOUS has Allow permission for operations: Write from hosts: *

Current ACLs for resource `Topic:webrequest_canary_test`:
 	User:CN=varnishkafka has Allow permission for operations: Write from hosts: *
	User:CN=varnishkafka has Allow permission for operations: Describe from hosts: *
	User:ANONYMOUS has Deny permission for operations: Write from hosts: *

Current ACLs for resource `Cluster:kafka-cluster`:
 	User:ANONYMOUS has Allow permission for operations: Describe from hosts: *
	User:ANONYMOUS has Allow permission for operations: Create from hosts: *
	User:ANONYMOUS has Allow permission for operations: DescribeConfigs from hosts: *
	User:CN=varnishkafka has Allow permission for operations: Create from hosts: *

I also tested whether or not wildcard in topic names worked, e.g. webrequest*. It does not :(

Change 401621 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Dont' expand wildcards in kafka acls command

https://gerrit.wikimedia.org/r/401621

Change 401621 merged by Ottomata:
[operations/puppet@production] Dont' expand wildcards in kafka acls command

https://gerrit.wikimedia.org/r/401621

Change 399700 merged by Ottomata:
[operations/puppet@production] Set cipher.suites and ssl.enabled.protocols for jumbo and varnishkafka (canary)

https://gerrit.wikimedia.org/r/399700