Running a zookeeper and kafka cluster with Kubernetes on AWS

I have been recently working with Russ Miles on coding microservices that follow principles he has laid out in the Antifragile Software book. As a simple demo we decided on using Kafka as a simple event store to implement an event sourcing pattern.

The main focus of this article is to talk about how to deploy a kafka cluster and manage its lifecycle via Docker containers and Kubernetes on AWS.

Initially, I wanted to quickly see how to get one instance of kafka to be available from outside the AWS world so that I could interact with it. My first move is always to look at the main Docker image repository for official or popular images. Interestingly, as of this writing, there is no official Kafka image. The most popular is wurstmeister/kafka which is what I decided to use.

However, this was not enough. Indeed, Kafka relies on Zookeeper to work. Spotify offers an image with both services in a single image, I don’t think that’s a good idea in production so I decided to forego it. For Zookeeper, I didn’t use the most popular because its documentation wasn’t indicating any possibility to pass on parameters to the service. Instead, I went with digitalwonderland/zookeeper which was supporting some basic parameters like setting the broker id.

Setting one instance of both services is rather straightforward and can be controlled by a simple Kubernetes replication controller like:

This could be exposed using the following Kubernetes service:

Notice the LoadBalancer type is used here because we need to create a AWS load-balancer to access those services from the outside world. Kubernetes scripts are clever enough to achieve this for us.

In the replication controller specification, we can see the requirement to let Kafka advertize its hostname. To make it work, this must be the actual domain of the AWS load-balancer. This means that you must create the Kubernetes service first (which is good policy anyway) and then, once it is done, write down its domain into the replication controller spec as the value of the KAFKA_ADVERTISED_HOST_NAME environment variable.

This is all good but this is not a cluster. It is merely a straight instance for development purpose. Even though Kubernetes promises you to look after your pods, it’s not a bad idea to run a cluster of both zookeeper and kafka services. This wasn’t as trivial as I expected.


The reason is mostly due to the way clusters are configured. Indeed, in Zookeeper’s case, each instance must be statically identified within the cluster. This means you cannot just increase the number of pod instances in the replication controller, because they would all have the same broker identifier. This will change in Zookeeper 3.5. Kafka doesn’t show this limitation anymore, indeed it will happily create a broker id for you if none is provided explicitely (though, this requires Kafka 0.9+).

What this means is that we now have two specifications. One for Zookeeper and one for Kafka.

Let’s start with the simpler one, kafka:

Nothing really odd here, we create a single kafka broker, connected our zookeeper cluster, which is defined below:

As you can see, unfortunately we cannot rely on a single replication controller with three instances as we do with kafka brokers. Instead, we run three distinct replication controllers, so that we can specify the zookeeper id of each instance, as well as the list of all brokers in the pool.

This is a bit of an annoyance because we therefore rely on three distinct services too:

Doing so means traffic is routed accordingly to each zookeeper instance using their service name (internally to the kubernetes network that is).

Finally, we have our Kafka service:

That one is simple because we only have a single kafka application to expose.

Now running these in order is the easy part. First, let’s start with the zookeeper cluster:

Once the cluster is up, you can check they are all happy bunnies via:

One of them should be LEADING, the other two ought to be FOLLOWERS.

Now, you can start your Kafa broker, first its service:

On AWS, you will need to wait for the actual EC2 load balancer to be created. Once that’s done, take its DNS name and edit the kafka-cluster.yaml spec to set it to KAFKA_ADVERTISED_HOST_NAME.

Obviously, if you have setup a DNS route to your load-balancer, simply use that domain instead of the load-balancer’s. In that case, you can set its value once for good in the spec.

Then, run the following command:

This will start the broker and automatically create the topic “mytopic” with one replica and two partitions.

At this stage, you should be able to connect to the broker and produce and consume messages. You might wantt o try kafkacat as a simple tool to play with your broker.

For instance, listing the topic on your broker:

You can also produce messages:

Consuming messages is as simple as:

At this stage, we still don’t have a kafka cluster. One might expect that running something like this should be enough:

But unfortunately, this will only create new kafka instances, it will not automatically start replicating data to the new brokers. This has to be done out of band as I will explain in a follow-up article.

As a conclusion, I would say that using existing images is not ideal because they don’t always provide the level of integration you’d hope for. What I would rather do is build specific images that initially converse with an external configuration server to retrieve some information they need to run. This would likely make things a little more smooth. Though in the case of zookeeper, I am looking forward for its next release to support dynamic cluster scaling.

18 thoughts on “Running a zookeeper and kafka cluster with Kubernetes on AWS”

  1. I followed your instruction, but cannot consume message, the error when running kafkacat -b localhost:9092 -C -t mytopic
    % ERROR: Topic test error: Broker: Leader not available

    1. This normally means your advertised hostname is not set correctly. Once I got that right, it all came together.

  2. I face problem executing this command “$ kafkacat -b [AWS_LB_DNS_or_YOUR_DNS_POINTING_AT_IT]:9092 -L”.

    I used the LoadBalancer Ingress url from the kubectl service description, for AWS_LB_DNS. I am running k8s on AWS.
    But when i try using kafkacat, i get the following error :

    %3|1467839779.839|FAIL|rdkafka#producer-1| Connect to ipv4# failed: Connection refused
    %3|1467839780.855|FAIL|rdkafka#producer-1| Connect to ipv4# failed: Connection refused
    %3|1467839781.873|FAIL|rdkafka#producer-1| Connect to ipv4# failed: Connection refused
    %3|1467839782.889|FAIL|rdkafka#producer-1| Connect to ipv4# failed: Connection refused
    %3|1467839783.913|FAIL|rdkafka#producer-1| Connect to ipv4# failed: Connection refused
    % ERROR: Failed to acquire metadata: Local: Broker transport failure

    Please help as to how I can resolve this

  3. You mentioned “But unfortunately, this will only create new kafka instances, it will not automatically start replicating data to the new brokers. This has to be done out of band as I will explain in a follow-up article.”

    Any progress on the follow-up article? If not, do you mind giving an outline of how you did this?


    1. Re-stating what John Ramey said, it would be good to know how to dynamically scale Kafka cluster with data replication.


  4. Hi,

    first I read the article and implemented your solution, to soon realise that it is completely wrong.

    Kafka can not run under a load balancer! Zookeeper however can. I see that you only run one broker and have the roadmap to work on multiple brokers. That can’t be done with an ELB because as a kafka consumer or producer you need to connect to the leader of the cluster. Doesn’t work unless you set up an ELB for each broker to expose the service itself. I solved that issue working on a k8s statefulset on the kafka layer and expose brokers via headless service aka broker-1.svcname.whatever, broker-2.svcname.whatever, etc . But again, accessing them out of the cluster is not trivial, as it is required to have one ELB for each pod (not a good solution) or use port mapping in kubernetes to the EC2 instance and expose via Route53. Again, not very nice solution.

    Hope that if you figure out an elegant way, you let us know. Looking forward to see a forward post.

    1. we just hit the same issue (“accessing them out of the cluster”) — we were all excited to move Kafka to Kubernetes but I’m not sure how we’ll get around this…. any ideas?

  5. I build a zookeeper and kafka,
    when I close, then I got an error as
    connect to ipv4.# failed.
    How to fix it?

  6. I am following the single instance method for development purpose, but my kafka pod is not running because it is not able to connect to zookeeper.

    This is my kafka-service.yaml
    apiVersion: v1
    kind: Service
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
    creationTimestamp: null
    app: kafka
    name: zook
    – name: zookeeper-port
    port: 2181
    targetPort: 2181
    protocol: TCP
    app: kafka

    apiVersion: v1
    kind: Service
    name: kafka-service
    app: kafka
    – port: 9092
    name: kafka-port
    targetPort: 9092
    protocol: TCP
    app: kafka
    type: LoadBalancer

    And below is my kafka-replicationController.yaml
    apiVersion: v1
    kind: ReplicationController
    name: kafka-controller
    replicas: 1
    app: kafka
    app: kafka
    – name: kafka
    image: wurstmeister/kafka
    – containerPort: 9092
    value: zook:2181
    – name: zookeeper
    image: digitalwonderland/zookeeper
    – containerPort: 2181

    I have filed the same issue on stackoverflow here as:

    Please let me know what is the issue and solution for this ?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.