Running a zookeeper and kafka cluster with Kubernetes on AWS

I have been recently working with Russ Miles on coding microservices that follow principles he has laid out in the Antifragile Software book. As a simple demo we decided on using Kafka as a simple event store to implement an event sourcing pattern.

The main focus of this article is to talk about how to deploy a kafka cluster and manage its lifecycle via Docker containers and Kubernetes on AWS.

Initially, I wanted to quickly see how to get one instance of kafka to be available from outside the AWS world so that I could interact with it. My first move is always to look at the main Docker image repository for official or popular images. Interestingly, as of this writing, there is no official Kafka image. The most popular is wurstmeister/kafka which is what I decided to use.

However, this was not enough. Indeed, Kafka relies on Zookeeper to work. Spotify offers an image with both services in a single image, I don’t think that’s a good idea in production so I decided to forego it. For Zookeeper, I didn’t use the most popular because its documentation wasn’t indicating any possibility to pass on parameters to the service. Instead, I went with digitalwonderland/zookeeper which was supporting some basic parameters like setting the broker id.

Setting one instance of both services is rather straightforward and can be controlled by a simple Kubernetes replication controller like:

This could be exposed using the following Kubernetes service:

Notice the LoadBalancer type is used here because we need to create a AWS load-balancer to access those services from the outside world. Kubernetes scripts are clever enough to achieve this for us.

In the replication controller specification, we can see the requirement to let Kafka advertize its hostname. To make it work, this must be the actual domain of the AWS load-balancer. This means that you must create the Kubernetes service first (which is good policy anyway) and then, once it is done, write down its domain into the replication controller spec as the value of the KAFKA_ADVERTISED_HOST_NAME environment variable.

This is all good but this is not a cluster. It is merely a straight instance for development purpose. Even though Kubernetes promises you to look after your pods, it’s not a bad idea to run a cluster of both zookeeper and kafka services. This wasn’t as trivial as I expected.


The reason is mostly due to the way clusters are configured. Indeed, in Zookeeper’s case, each instance must be statically identified within the cluster. This means you cannot just increase the number of pod instances in the replication controller, because they would all have the same broker identifier. This will change in Zookeeper 3.5. Kafka doesn’t show this limitation anymore, indeed it will happily create a broker id for you if none is provided explicitely (though, this requires Kafka 0.9+).

What this means is that we now have two specifications. One for Zookeeper and one for Kafka.

Let’s start with the simpler one, kafka:

Nothing really odd here, we create a single kafka broker, connected our zookeeper cluster, which is defined below:

As you can see, unfortunately we cannot rely on a single replication controller with three instances as we do with kafka brokers. Instead, we run three distinct replication controllers, so that we can specify the zookeeper id of each instance, as well as the list of all brokers in the pool.

This is a bit of an annoyance because we therefore rely on three distinct services too:

Doing so means traffic is routed accordingly to each zookeeper instance using their service name (internally to the kubernetes network that is).

Finally, we have our Kafka service:

That one is simple because we only have a single kafka application to expose.

Now running these in order is the easy part. First, let’s start with the zookeeper cluster:

Once the cluster is up, you can check they are all happy bunnies via:

One of them should be LEADING, the other two ought to be FOLLOWERS.

Now, you can start your Kafa broker, first its service:

On AWS, you will need to wait for the actual EC2 load balancer to be created. Once that’s done, take its DNS name and edit the kafka-cluster.yaml spec to set it to KAFKA_ADVERTISED_HOST_NAME.

Obviously, if you have setup a DNS route to your load-balancer, simply use that domain instead of the load-balancer’s. In that case, you can set its value once for good in the spec.

Then, run the following command:

This will start the broker and automatically create the topic “mytopic” with one replica and two partitions.

At this stage, you should be able to connect to the broker and produce and consume messages. You might wantt o try kafkacat as a simple tool to play with your broker.

For instance, listing the topic on your broker:

You can also produce messages:

Consuming messages is as simple as:

At this stage, we still don’t have a kafka cluster. One might expect that running something like this should be enough:

But unfortunately, this will only create new kafka instances, it will not automatically start replicating data to the new brokers. This has to be done out of band as I will explain in a follow-up article.

As a conclusion, I would say that using existing images is not ideal because they don’t always provide the level of integration you’d hope for. What I would rather do is build specific images that initially converse with an external configuration server to retrieve some information they need to run. This would likely make things a little more smooth. Though in the case of zookeeper, I am looking forward for its next release to support dynamic cluster scaling.