ws4py is eager for a new maintainer

Years ago, I got really interested in the WebSocket protocol that eventually landed as RFC 6455. It was the first time I would spend so much time trying to participate elaborating a protocol. Obviously, my role was totally minor but the whole experience was both frustrating and exhilarating at the same time.

I thought the protocol was small enough that it would be a good scope for a Python project: ws4py. Aside from implementing the core protocol, what I was interested in wa two folds. First, relying on Python generators all the way down as a control flow mechanism. Second, I wanted to decouple the socket layer from the interface so I could more easily write tests through fake WebSocket objects. Indeed, as any network protocol, the devil is in the details and WebSocket is no different. There are a few corner cases that would have been hard to test with a socket but were trivial with an interface.

Did I succeed? Partly. In insight, my design was not perfect and I made a couple of mistakes:

  • I relied too much on an OOP design for high-level interface. Quickly I realised I could, and should, have used a more functional approach. After all, WebSockets are merely event handlers and a functional approach would have made more sense. With that said, Python is not a functional language, it has a few features but the ecosystem is not entirely driven that way so, at the time, that might have made the library adoption more difficult.
  • I sort of wrote my internal event loop abstraction on top of select, epoll… One goal I had set to myself is not to rely on any external dependency for the library. At least, I wanted to make sure the library could be used as-is for quick and dirty tests. That’s a bad idea for things as complex as proper event looping. Eventually, I provided support for gevent and asyncio (called tulip back then) though. Still, there is a chunk of the code that could be made simpler and more robust if we changed that.
  • Though I did decouple the socket from the interface and provided a nicer way of testing the protocol, the socket interface still leaks here and there making the code not as tight as it could be.

I’ve started that project years ago and I haven’t really paid attention to it for the last two years. Some folks, using the library, have been getting restless and no doubt frustrated by my lack of commitment to it. I want to be honest, I’ve lost interest in that project, I’ve moved to other puzzles that have picked my brain and I don’t have the energy for ws4py any longer.

Is the project still relevant anyway? Well, GitHub tells me it is starred by almost 800 individuals. That’s not massive but it’s decent for sure, the highest rank for any my projects. Also, it’s the only websocket library that runs with CherryPy.

Is WebSocket relevant still? That’s not for me to say. Some people claim it’s dying a slow death due to HTTP/2. Honestly, I have no idea. WebSocket, much like the Atom Publishing Protocol (another protocol I cared for), didn’t do as well as their authors may have expected initially.

Anyhow, I think I should be passing the relay to someone else who is motivated to take the project on. I’m not sure how this will happen but I would like to think we can be as transparent as we can about this. Please use the comments below or the mailing-list to discuss your interest.

If by the end of February, no one has showed any interest, I will deprecate the project officially so people can gradually move on. The project will stay on GitHub but it’ll be clear that no further changes or releases will be done.

It’s sad to let go a project you cared for but it’s only fair to the community to be transparent when you’ve lost the energy for it. Long live ws4py!

Ubuntu on a Dell XPS 15 9550

I have enjoyed using a Dell XPS 15 9550 for a few months now. Fantastic machine. When I got it, I tried to install Ubuntu 15.10 which utterly failed. Basically, I couldn’t even boot up the installation. I gave up and started monitoring threads on that subject.

Today is the release of Ubuntu 16.10 and I thought I’d give it a try once more. Well, good on me as it worked flawlessly! For those interested, here’s what I did:

  1. Updated, from Windows, the BIOS to 1.2.14 as well as the drivers for the Dell WD15 dock station (since I own one)
  2. Disabled the Secure Boot in the BIOS and rebooted
  3. Switched from RAID to AHCI from Windows drivers for the storage (really make a recovery disk just in case) following this link. Windows didn’t reboot after I changed the driver type from Device Manager. But, rebooting in safe mode, and then rebooting fixed that automatically somehow (see this comment). You have to switch to AHCI or Ubuntu will not see your storage during installation
  4. Next booted on my USB key and executed the installation process

That’s it!

Once installed and restarted, I could boot on both Linux and Windows fine. Once logged into my Ubuntu session, I installed the Nvidia drivers (367.57) via the Ubuntu driver utility, then Cinnamon because it handles HDPi automatically and I dislike Unity with a passion.

Everything has been working fine so far (natively without any tweaking):

  • HDPi on the 4K screen
  • Keyboard backlight
  • Touchscreen
  • Bluetooth
  • Wifi
  • Dual screen (HDMI)
  • Logitech C920

I haven’t tried yet or I’m not sure:

  • Hibernating
  • Sleep mode
  • the dock station (it was recognised but did not use it properly yet)

 

There is hope!

Let’s talk about microservices

Well well… let’s talk about microservices. I have been keen on following what was going on with microservices since its inception but I have never been its most vocal champion. I don’t believe in screaming microservices like a fanboy. It is mostly counter-productive.

With that being said, I see more and more people talking about them, but in many instances, it is approached like many other technologies or ideas in the past: “Oh look, it’s trendy, I must be doing it!” or “I am your manager, other managers tell me we should do microservices. So team, let’s do it!”.

This post will try to clarify a few things about why microservices may help you and some guidelines about building them.

Why do we need microservices in the first place?

Why indeed? Let’s be clear, microservices are not magical entities. If you believe switching to a microservices architecture will solve your problems, well, 99% of the time, you will be proved wrong. Let’s make it clear:

All or other things being equal, microservices will not benefit your project.

This does not start well, does it? Still, it is critical that we are honest about it.

Maybe it is the wrong question. A better one should be: “Why does our software fail to keep up with the business’s needs?

Indeed, this is real issue here. The feeling that your software eventually hinders your capacity to mov the business[1] forward. It’s a shame because it eventually ends up with some sort of bitterness and grief from within the company.

Obviously, hearing that microservices are small components that make you faster is attractive, but they can only help you if you understand that what truly matters is the shift towards designing for change.

In other words, we need microservices because we want to adapt our software more frequently to keep up with inevitable changes and the diruptive aspect of innovation. Inevitable doesn’t mean we should be defeated, quite the contrary! Indeed, the more we welcome and design for change, the more confident we will be in releasing fast and lean software that respond appropriately to its expectations.

What is a microservice by the way?

Russ Miles often plays the game of finding the wackiest definition of microservices, this one by Micah Blalock is certainly worth a read. Truth be said though, there is no strict definition of microservices and it’s a freaking good thing. The minute you have an industry-approved definition, you get ready for a feast much like the industry had with SOAP for instance.

Still, we can suggest some features that apply well to microservices:

They should strive at being simple and comprehensible. You can’t change something you don’t understand.

They are single-purpose components. In other words, they are small not because you count the number of lines but because they do one thing only. Usually, that one thing means your code-base for one microservice will remain small.

They should be treated like cattle not pets. Do not care for your microservices instances, care for your system’s general status. Kill and replace microservices that are unhealthy. This is quite a change when you come from a monolith application because you tend to care for its status much more.

A related one is that microservices ought to be independant. This will make it so much simpler to comprehend how to evolve your system.

A second related principle, microservices sould not share states. Decoupling your code while keeping coupling at the data level is a recipe for failure.

They should be keen on following Postel’s law: “Be liberal in what you accept but conservative in what you send”. The idea here is to play nicely with the change your system will constantly endure while reducing the coupling imposed by one microservice onto others.

They ought to be easy to roll out. Let’s recall that we want to keep up with fast pace of changes. We can’t achieve this if the release cycle is slow and complicated.

Honestly, there isn’t much more to it.

It is more the way they allow for an antifragile system to be built rather than how they are individually implemented. In other words, don’t over-think your microservices. Just make sure they help you design for change in a fast and confident manner.

 

How to build microservices?

At this stage, you should realise I do not believe there is one true way to implement microservices. However, there are definitely tools and approaches that will help along the way. I will enumerate a few of them, but please explore the field for what works for you. You would also do worse than reading Sam Newman‘s book “Building Microservices“.

First, you need to decide what are your microservices. Russ Miles describes a rather useful mechanism to scope your microservices using the Life Preserver diagram tool. You may even design around the CQRS pattern to further refine your microservices’ responsibilities.

Then, make a decision regarding the integration’s style you’re going to apply to your system. You don’t have to pick only one. Most of the time, you’ll hear about REST (or sort-of anyway) but you could also bet on a message oriented approach. If you use REST, you might want to use StopLight.io to design your API and even export it following the Swagger 2 format.

Recall that you never want to share state accross your microservices, at least not through a single common database for instance. What matters are the events flowing through. You may therefore rely on event-sourcing perhaps and reconstruct local states rather than care for consistency that is based on strong coupling. Finding the best event-store for your requirements and constraints may require some testing still but Kafka has gained some points.

Microservices are polyglot by nature. There is no need for you to keep implementing them using one single language. As per usual, chose the right tool for the job.

Please, do make sure you test your microservices.

Probably package up your microservices into portable images. This will reduce the complexity of dependency management and make it much simpler to distributed. You will likely benefit from Docker or Rocket.

Obviously, if you package up your microservices with Docker/Rocket, you will gain from managing their high-demand life-cycle through a service orchestrator. The field is quite busy but the major players are Google Kubernetes, Mesos/Marathon, Docker Swarm, Amazon ECS, CoreOS Fleet or Rancher. Really, this is worth taking the time setting a proper test for them because their documentation will not be enough to distinguish them.

You will then need a configuration service. There are various possibilities here but you may rely on a distributed filesystem or services like Consul or Kinto.

Make sure your microservices can be discovered! Rely on a service discovery tool such as Consul or etcd.

Ensure your microservices play nicely with external loggers like LogStash as well as monitoring tools like DataDog or Prometheus. Honestly, in both areas, there are tons of products so make time to review some of them.

You will likely want to set some sort of proxy before your services (to control the rate or secure the requests for instance or load-balance between microservice instances) using something like Kong or HAProxy.

From a security perspective, consider general best practices of web applications and make sure your team ponders on the right questions.

There are many related topics obviously but I will conclude with two critical aspects of building microservices properly:

  • As a developer, you own your microservices. This means that you can’t stop caring for them once you have pushed the code out. Your job is to follow-up to production. Talk to your ops. Actually, just don’t talk to them, make sure you work *with* them!
  • Stress your system! Remember you are building an antifragile system. You want to be confident you can test the plastic nature of your live system. Stress it!

That’s a lot to think about! Is there a simpler way?

Everything I have listed before should be on your shopping list when you build microservices. If you feel like it’s quite overwhelming, you will be happy to hear that platforms start emerging to take care of the entire operational aspect. For instance, AWS Lambda, Google Functions and (as it appears) the newest player in the field: Atomist. Make sure to review them.

Happy coding!

[1] Note that I am using “business” in a very broad sense here, don’t get hang on it too much.

An asynchronous CherryPy server based on asyncio

CherryPy is a minimalist web application server written in Python. Hundreds of people have relied on it for more than fourteen years now. Recently, I’ve gained interest in the native asynchronous support Python has gained through the implementation of PEP-3156 and PEP-0492. Basically, Python now supports coroutines natively and in a friendly API interface. In addition, thanks to the built-in asyncio module, you can naturally develop concurrent applications.

CherryPy has always used a multi-threaded engine to support concurrent web applications. This may sound surprising, but this is working very well still in 2016. Yet, I love a puzzle and I was interested in making CherryPy run as a set of coroutines rather than a bunch of threads. So a couple of days ago, I set myself on the task to make it happen.

And so here it is, CherryPy on asyncio!

This a branch on my fork, not an official part of the CherryPy project yet.

Now, you can run code like this:

The only differences are:

  • importing cherrypy.async
  • turning page-handlers into coroutines

That is all.

The idea is that the cherrypy.async patches all the internals of CherryPy for you and turn it into an async-aware server.

Note that the code currently runs only on Python 3.5+ as we use the async/await keywords.

Has it been easy?

Turning a project that was not designed for coroutines is not that complicated thanks to the simple interface provided by async/await. However, anytime an I/O operation is performed, it is necessary to transform a call like:

to

This mundane line is sometimes is part of a function call, in that case, you have to overwrite and copy/paste the whole function to change that one line. Mind you, this can’t be avoided because you must also re-declare the function as a coroutine anyway by prefixing it with async.

As usual, the difficulty lies in the entanglement of your code. The more you made simple to comprehend, the simpler and faster it will be to change with confidence.

What was changed?

Mostly the HTTP server, the machinery of CherryPy: the internal engine/bus, the dispatcher, the request handling.

If we can find a way to re-organise the existing code, actually few lines would eventually be changed.

Note, I made the decision not to make this server WSGI aware because I find that rather counter-intuitive with an async-based server somehow.

Is it production ready?

Not at all. It hasn’t been really tested (this will require to re-write many tests so that they play along with coroutines).

It is also, for some unknown reason yet, much slower than the multithreaded version. Profiling will need to be performed.

Still, if you are feeling like testing it:

I don’t know if this code will go further than this but, maybe this will interest the community enough so that it moves forward. This would make CherryPy more suitable for HTTP2 and websockets.

dalamb – An AWS Lambda clone (sort of)

I was going through the AWS Lambda main presentation page yesterday and I started to wonder if it would be possible to create your own stack with the same featureset.

First off, AWS Lambda is a stunning programming model. A cross between stateless distributed functional programming at scale (say that fast ten times if you dare) coupled with zero-administration requirements. In other words, you write a function and AWS Lambda runs it at the appropriate scale for you.

What’s more is that, AWS Lambda can react to events following the event sourcing pattern. So you don’t keep any internal state, AWS Lambda executes your function with the event data.

Our job, as software developer, has just become almost too simple. The good news is that we can now focus on the value we are aiming for rather than the plumbing and machinery underneath. A whole new world.

With that said, AWS Lambda is brilliant but, well, is deeply tied to the AWS services and infrastructure. That’s sort of the point obviously, however, could we create a similar stack and run it ourselves? Because, why not!

Let’s therefore introduce “dalamb – An AWS Lambda clone (sort of)“.

dalamb properties and features

What would be dalamb properties? How could we design such a beast? Interestingly, we have most of the pieces at hand, at least from a technological and architecture perspective.

Let’s rewind a bit, AWS Lambda gives you the main following features:

  • extend AWS services with custom logic
  • perform funky operations triggered by AWS events
  • expose your function as a web service (insert mention to RESTful URL here)

All of this is backed by the powerful AWS infrastructure ensuring:

  • fault-tolerance
  • automatic scalability
  • a rich security model
  • pricing with precision

These features and properties tell us the stories dalamb should pursue:

  • The event-sourcing story: how to map function to events
    • The REST story: this one is just a specific case of the event sourcing story where the events are triggered by requests to a given endpoint
  • The delivery story: can we package up any function in any language?
  • The operational story: deploy, update, scale, keep available…

It’s all about events

The heart of the AWS Lambda platform is its event-sourcing feature. Events are everywhere, they actually define the system in itself.

As developers, what we build, when we forget about them, are fragile stateful monoliths that quickly fail under their own mass and incapacity to accept inevitable changes. Building discrete functions that react to events mean you reduce friction between changes and pressures on your system while, as a developer, keep a comprehensible view of the system.

Events have basic properties:

  • self-contained: the information enclosed within must not depend on moving external resources
  • immutable: events never change once they have been emitted
  • descriptive: events do not command, they describe facts
  • sequenced: event streams ordering must be supported
  • structured: so that rules and patterns can be matched against them

dalamb is not a re-write of the entire set of AWS services. It focuses on mapping discrete, simple, functions to events flowing through the system. For this reason, dalamb will be designed around event-sourcing patterns.

The goal will be to make it simple for external resources to send events and straightforward for functions to be matched to these events following simple rules.

dalamb will likely start using Kafka as a simple event-store backend.

Change is the nature of any live system

A live system is constantly changing due to internal and external stressors. Your functions are one of those stressors. In order for the system not to collapse onto itself, a certain level of automatic elasticity and draining are required.

Draining means that part of the system can be considered like wasteful and the system will find a way to clean them up. Elasticity means that the system must be able to absorb any load stress in a way that keeps the overall system functional.

AWS Lambda owns the responsibility of managing the system’s constant changing nature. So will dalamb. However, since dalamb will not own the infrastructure itself, it will be bound, initially at least, by the fixed size of the underlying infrastructure. However, within those boundaries, dalamb will ensure the system keeps its promises and stays live and well.

dalamb will likely take advantage of Mesos for resources sharing and Kubernetes or Marathon for manging the lifecycle of each function instances.

Geared towards developers

dalamb is meaningless if developers cannot express themselves though it. To achieve this, dalamb will not make any assumption regarding the actual code environment that is executed.

Indeed, in spite of being a powerful platform, AWS Lambda is currently limited by its selective targets. Oddly enough, if you are a Go or Ruby developer (for instance), you must call your code as a subprocess of the executed function.

dalamb takes the bet that the doors should be open to any developers with the same interface. To achieve this, dalamb will expect functions to be packaged up into container images, under the control of the developers. What dalamb should provide is the entry point definition for a container to be executed appropriately on the platform as a dalamb function.

Containers will be short-lived and their state will not be kept around for longer than the time required to run the function.

Developers will store their function on platforms like github or bitbucket and dalamb will simply react to events from those platforms to release to production new versions of a function.

What’s next?

dalamb is by no means properly defined or even designed. It is a story about developers achieving great values by relying on developer-oriented platforms. Whether it is Heroku or AWS Lambda, dealing with the complexity of this world has never been so accessible.

Customizing Kubernetes AWS deployment settings

Kubernetes provides a nice script to quickly deploy a small cluster on various infrastructure backends. On AWS, the documentation is a tad short about detailing how to customize the installation.

By default, Kubernetes is deployed on the Oregon availability zone (us-west-2a) with a cluster size of four minions driven by a single master. The EC2 instances for the minions and their master have a fairly limited size (t2.micro).

Kubernetes relies on Ubuntu Vivid by default and favors Docker as the container provider. The container storage backend is AUFS which is the common option on Ubuntu.

From a network perspective, Kubernetes uses the well-known 10.0.0.0/16 subnet for the services while keeping the 10.244.0.0/16 subnet for the cluster. Interestingly, it automatically adds the 10.0.0.0/8 subnet as an insecure registry so that you can more easily deploy Docker images hosted on a private registry.

Kubernetes relies on adding EC2 EBS volumes to extend the instances default storage. It will, in fact, use that storage to host container images. The default settings rely on a 32gb general purpose SSD volumes for the minions. The master receives smaller volumes but doesn’t require much in the first place.

Finally, the script enables, by default, node and cluster logging (through elasticsearch and kibana) as well as cluster monitoring (via influxDB). Once completed the deployment process, the script kindly tells you where to access those services.

All of these settings are kept inside the cluster/aws/config-default.sh resource found in the Kubernetes release. By overriding them before running the deployment script provided by the Kubernetes team, you can easily customize your Kubernetes cluster on AWS.

For instance, this is my development setup:

I usually leave the other settings as-is because they suit my requirements. I encourage you to review the config-default.sh script to change other values.

Save your settings in a shell script sourced from your ~/.bashrc script and run the installation command given in the Kubernetes documentation.

Running a zookeeper and kafka cluster with Kubernetes on AWS

I have been recently working with Russ Miles on coding microservices that follow principles he has laid out in the Antifragile Software book. As a simple demo we decided on using Kafka as a simple event store to implement an event sourcing pattern.

The main focus of this article is to talk about how to deploy a kafka cluster and manage its lifecycle via Docker containers and Kubernetes on AWS.

Initially, I wanted to quickly see how to get one instance of kafka to be available from outside the AWS world so that I could interact with it. My first move is always to look at the main Docker image repository for official or popular images. Interestingly, as of this writing, there is no official Kafka image. The most popular is wurstmeister/kafka which is what I decided to use.

However, this was not enough. Indeed, Kafka relies on Zookeeper to work. Spotify offers an image with both services in a single image, I don’t think that’s a good idea in production so I decided to forego it. For Zookeeper, I didn’t use the most popular because its documentation wasn’t indicating any possibility to pass on parameters to the service. Instead, I went with digitalwonderland/zookeeper which was supporting some basic parameters like setting the broker id.

Setting one instance of both services is rather straightforward and can be controlled by a simple Kubernetes replication controller like:

This could be exposed using the following Kubernetes service:

Notice the LoadBalancer type is used here because we need to create a AWS load-balancer to access those services from the outside world. Kubernetes scripts are clever enough to achieve this for us.

In the replication controller specification, we can see the requirement to let Kafka advertize its hostname. To make it work, this must be the actual domain of the AWS load-balancer. This means that you must create the Kubernetes service first (which is good policy anyway) and then, once it is done, write down its domain into the replication controller spec as the value of the KAFKA_ADVERTISED_HOST_NAME environment variable.

This is all good but this is not a cluster. It is merely a straight instance for development purpose. Even though Kubernetes promises you to look after your pods, it’s not a bad idea to run a cluster of both zookeeper and kafka services. This wasn’t as trivial as I expected.

 

The reason is mostly due to the way clusters are configured. Indeed, in Zookeeper’s case, each instance must be statically identified within the cluster. This means you cannot just increase the number of pod instances in the replication controller, because they would all have the same broker identifier. This will change in Zookeeper 3.5. Kafka doesn’t show this limitation anymore, indeed it will happily create a broker id for you if none is provided explicitely (though, this requires Kafka 0.9+).

What this means is that we now have two specifications. One for Zookeeper and one for Kafka.

Let’s start with the simpler one, kafka:

Nothing really odd here, we create a single kafka broker, connected our zookeeper cluster, which is defined below:

As you can see, unfortunately we cannot rely on a single replication controller with three instances as we do with kafka brokers. Instead, we run three distinct replication controllers, so that we can specify the zookeeper id of each instance, as well as the list of all brokers in the pool.

This is a bit of an annoyance because we therefore rely on three distinct services too:

Doing so means traffic is routed accordingly to each zookeeper instance using their service name (internally to the kubernetes network that is).

Finally, we have our Kafka service:

That one is simple because we only have a single kafka application to expose.

Now running these in order is the easy part. First, let’s start with the zookeeper cluster:

Once the cluster is up, you can check they are all happy bunnies via:

One of them should be LEADING, the other two ought to be FOLLOWERS.

Now, you can start your Kafa broker, first its service:

On AWS, you will need to wait for the actual EC2 load balancer to be created. Once that’s done, take its DNS name and edit the kafka-cluster.yaml spec to set it to KAFKA_ADVERTISED_HOST_NAME.

Obviously, if you have setup a DNS route to your load-balancer, simply use that domain instead of the load-balancer’s. In that case, you can set its value once for good in the spec.

Then, run the following command:

This will start the broker and automatically create the topic “mytopic” with one replica and two partitions.

At this stage, you should be able to connect to the broker and produce and consume messages. You might wantt o try kafkacat as a simple tool to play with your broker.

For instance, listing the topic on your broker:

You can also produce messages:

Consuming messages is as simple as:

At this stage, we still don’t have a kafka cluster. One might expect that running something like this should be enough:

But unfortunately, this will only create new kafka instances, it will not automatically start replicating data to the new brokers. This has to be done out of band as I will explain in a follow-up article.

As a conclusion, I would say that using existing images is not ideal because they don’t always provide the level of integration you’d hope for. What I would rather do is build specific images that initially converse with an external configuration server to retrieve some information they need to run. This would likely make things a little more smooth. Though in the case of zookeeper, I am looking forward for its next release to support dynamic cluster scaling.

Deploying a docker container of a CherryPy application onto a CoreOS cluster

Previously, I presented a simple web application that was distributed into several docker containers. In this article, I will be introducing the CoreOS platform as the backend for clusterizing a CherryPy application.

CoreOS quick overview

CoreOS is a Linux distribution designed to support distributed/clustering scenarios. I will not spend too much time explaining it here as their documentation already provides lots of information. Most specifically, review their architecture use-cases for a good overview of what CoreOS is articulated.

What matters to us in this article is that we can use CoreOS to manage a cluster of nodes that will host our application as docker containers. To achieve this, CoreOS relies on a technologies such as systemd, etcd and fleet at its core.

Each CoreOS instance within the cluster runs a linux kernel which executes systemd to manage processes within that instance. etcd is a distributed key/value store used across the cluster to enable service discovery and configuration synchronization within the cluster. Fleet is used to manage services executed within your cluster. Those services aredescribed in files called unit files.

Roughly speaking, you use a unit-file to describe your service and specify which docker container to execute. Using fleet, you submit and load that service to the cluster before starting/stopping it at will. CoreOS will determine which host it will deploy it on (you can setup constraints that CoreOS will follow). Once loaded onto a node, the node’s systemd takes over to manage the service locally and you can use fleet to query the status of that service from outside.

Setup your environment with Vagrant

Vagrant is a nifty tool to orchestrate small deployment on your development machine. For instance, here is a simple command to create a node with Ubuntu running on it:

Vagrant has a fairly rich command line you can script to generate a final image. However, Vagrant usually provisions virtual machines by following a description found within a simple text file (well actually it’s a ruby module) called a Vagrantfile. This is the path we will be following in this article.

Let’s get the code:

From there you can create the cluster as follows:

I am not using directly vagrant to create the cluster because there are a couple of other operations that must be carried to let fleet talk to the CoreOS node properly. Namely:

  • Generate a new cluster id (via https://discovery.etcd.io/new)
  • Start a ssh agent to handle the node’s SSH identities to connect from the outside
  • Indicate where to locate the node’s ssh service (through a port mapped by Vagrant)
  • Create the cluster (this calls vagrant up internally)

Once completed, you should have a running CoreOS node that you can log into:

To destroy the cluster and terminate the node:

This also takes care of wiping out local resources that we don’t need any longer.

Before moving on, you will need to install the fleet tools.

Run your CherryPy application onto the cluster

If you have destroyed the cluster, re-create it and make sure you can speak to it through fleet as follows:

Bingo! This is the public address we statically set in the Vagrantfile associated to the node.

Let’s ensure we have no registered units yet:

Okay, all is good. Now, let’s push each of our units to the cluster:

As you can see, the unit files have been registered but they are not loaded onto the cluster yet.

Notice the naming convention used for webapp_app@.service, this is due to the fact that this is will not be considered as a service description itself but as a template for a named service. We will see this in a minute. Refer to this extensive DigitalOcean article for more details regarding unit files.

Let’s now load each unit onto the cluster:

Here, we asked fleet to load the service onto an available node. Considering there is a single node, it wasn’t a a difficult decision to make.

At that stage, your service is not started. It simply is attached to a node.

It is not compulsory to explicitely load before starting a service. However, if gives you the opportunity to unload a service if a specific condition occurs (service needs to be amended, the chosen host isn’t valid any longer…).

Now ce can finally start it:

You can see what’s happening:

Or alternatively, you can request the service’s status:

Once the service is ready:

Starting a service from a unit template works the same way except you provide an identifier to the instance:

The reason I chose 1 as the identifier is so that it the container’s name becomes notes1 as expected by the load-balancer container when linking it to the application’s container. As described in the previous article.

Start a second instance of that unit template:

That second instance starts immediatly because the image is already there.

Finally, once both services are marked as “active”, you can start the load-balancer service as well:

At that stage, the complete application is up and running and you can go to http://localhost:7070/ to use it. Port 7070 is mapped to port 8091 by vagrant within our Vagrantfile.

No such thing as a free lunch

As I said earlier, we created a cluster of one node on purpose. Indeed, the way all our containers are able to dynamically know where to locate each other is through the linking mechanism. Though this works very well in simple scenarios like this one, this has a fundamental limit since you cannot link across different hosts. If we had multiple nodes, fleet would try distributing our services accross all of them (unless we decided to constraint this within the unit files) and this would break the links between them obviously. This is why, in this particular example, we create a single node’s cluster.

Docker provides a mechanism named ambassador to address this restriction but we will not review it, instead we will benefit from a flat sub-network topology provided by weave as it seems it follows a more traditional path than the docker’s linking approach. This will be the subject of my next article.

A more concrete example of a complete web application with CherryPy, PostgreSQL and haproxy

In the previous post, I described how to setup a docker image to host your CherryPy application. In this installment, I will present a complete – although simple – web application made of a database, two web application servers and a load-balancer.

Setup a database service

We are going to create a docker image to host our database instance, but because we are lazy and because it has been done already, we will be using an official image of PostgreSQL.

As you can see, we run the official, latest, PostgreSQL image. By setting the POSTGRES_USER and POSTGRES_PASSWORD, we make sure the container creates the according account for us. We also set a name for this container, this will be useful when we link to it from another container as we will see later on.

A word of warning, this image is not necessarily secure. I would advise you to consider this question prior to using it in production.

Now that the server is running, let’s create a database for our application. Run a new container which will execute the psql shell:

We have connected to the server, we then create the “notes” database and connect to it.

How did this work? Well, the magic happens through the –link wedb:postgres we provided to the run command. This tells the new container we are linking to a container named webdb and that we create an alias for it inside that new container. That alias is used by docker to initialize a few environment variables such as:

Notice the POSTGRES_ prefix? This is exactly the alias we gave in the command’s argument. This is the mechanism by which you will link your containers so that they can talk to each other.

Note that there are alternatives, such as weave, that may be a little more complex but probably more powerful. Make sure to check them out at some point.

Setup our web application service

We are going to run a very basic web application. It will be a form to take notes. The application will display them and you will be able to delete each note. The notes are posted via javascript through a simple REST API. Nothing fancy. Here is a screenshot for you:

notes_screen

By the way, the application uses Yahoo’s Pure.css framework to change from bootstrap.

Simply clone the mercurial repository to fetch the code.

This will download the whole repository but fear not, it’s rather lightweight. You can review the Dockerfile which is rather similar to what was described in my previous post. Notice how we copy the webapp subdirectory onto the image.

We can now create our image from that directory:

As usual, change the tag to whatever suits you.

Let’s now run two containers from that image:

We link those two containers with the container running our database. We can therefore use that knowledge to connect to the database via SQLAlchemy. We also publish the application’s port to two distinct ports on the host. Finally, we name our containers so that can we reference them in the next container we will be creating.

At this stage, you ought to see that your application is running by going either to http://localhost:8080/ or http://localhost:8081/.

Setup a load balancer service

Our last service – microservice should I say – is a simple load-balancer between our two web applications. To support this feature, we will be using haproxy. Well-known, reliable and lean component for such a task.

Tak some time to review the Dockerfile. Notice how we copy the local haproxy.cfg file as the configuration for our load-balancer. Build your image like this:

And now run it to start load balancing between your two web application containers:

In this case, we will be executing the container in the background because we are blocking on haproxy and it won’t lok to the console anyway.

Notice how we link to both web application containers. We set short alias just by pure lazyness. We publish two ports to the host. The 8090 port will be necessary to access the stats page of the haproxy server itself. The 8091 port will be used to access our application.

To understand how we reuse the the aliases, please refer to the the haproxy.cfg configuration. More precisely to those two lines:

We load-balance between our two backend servers and we do not have to know their address at the time when we build the image, but only when the container is started.

That’s about it really. At this stage, you ought to connect to http://localhost:8091/ to see use your application. Each request will be sent to each web application’s instances in turn. You may check the status of your load-balancing by connecting to http://localhost:8090/.

Obviously, this just a basic example. For instance, you could extend it by setting another service to manage your syslog and configure haproxy to send its log to it.

Next time, we will be exploring the world of CoreOS and clustering before moving on to service and resource management via Kubernetes and MesOS.

Create a docker container for your CherryPy application

In the past year, process isolation through the use of containers has exploded and you can find containers for almost anything these days. So why not creating a container to isolate your CherryPy application from the rest of the world?

I will not focus on the right and wrongs in undertaking such a task. This is not the point of this article. On the other hand, this article will guide you through the steps to create a base container image that will support creating per-project images that can be run in containers.

We will be using docker for this since it’s the hottest container technology out there. It doesn’t mean it’s the best, just that it’s the most popular which in turns means there is high demand for it. With that being said, once you have decided containers are a relevant feature to you, I encourage you to have a look at other technologies in that field to draw your own conclusion.

Docker uses various Linux kernel assets to isolate a process from the other running processes. In particular, it uses control groups to constraints the resources used by the process. Docker also makes the most of namespaces which create an access layer to resources such as network, mounted devices, etc.

Basically, when you use docker, you run an instance of an image and we call this a container. An image is mostly a mille-feuille of read-only layers that are eventually unified into one. When an image is run as a container, an extra read-write layer is added by docker so that you can make changes at runtime from within your container. Those changes are lost everytime you stop the running container unless you commit it into a new image.

So how to start up with docker?

Getting started

First of all, you must install docker. I will not spend much time here explaining how to go about it since the docker documentation does it very well already. However, iI encourage you to:

  • install from the docker repository as it’s more up to date usually than official distribution repositories
  • ensure you can run docker commands as a non-root user. This will make your daily usage of docker much easier

At the time of this writing, docker 1.4.1 is the latest version and this article was written using 1.3.3. Verify your version as follow:

Docker command interface

Docker is an application often executed as a daemon. To interact with it you use the command line interface via the docker command. Simply run the following command to see them:

Play a little with docker

Before we move on creating our docker image for a CherryPy application, lets play with docker.

The initial step is to pull an existing image. Indeed, you will likely not create your own OS image from scratch. Instead, you will use a public base image, available on the docker public registry. During the course of these articles, we will be using a Ubuntu base image. But everything would work the same wth Centos or something else.

Easy right? The various downloads are those of the intermediary images that were generated by the Ubuntu image maintainers. Interestingly, this means you could start your image from any of those images.

Now that you have an image, you may wish to list all of them on your machine:

Notice that the intermediate images are not listed here. To see them:

Note that, in the previous call we didn’t specify any specific version for our docker image. You may wish to do so as follow:

Let’s pull a centos image as well for the fun:

Let’s now run a container and play around with it:

In the previous command, we start a bash command executed within a container using the Centos image tagged 7. We name the container to make it easy to reference it afterwards. This is not compulsory but is quite handy in certain situations. We also tell docker that it can dispose of that container when we exit it. Otherwise, the container will remain.

This is interesting because it shows that, indeed, the container is executed in the host kernel which, in this instance, is my Ubuntu operating system.

Finally below, let’s see the network configuration:

Note that the eth0 interface is attached to the bridge the docker daemon created on the host. The docker security scheme means that, by default, nothing can reached that interface from the outside. However the docker may contact the outside world. Docker has an extensive documentation regarding its networking architecture.

Note that you can see containers statuses as follow:

Exit the container:

Run again the command:

As we can see the container is indeed gone. Let’s now rewind a little and do not tell docker to automatically remove the container when we exit it:

Let’s see if the container is there:

Nope. So what’s different? Well, try again to start a container using that same name:

Ooops. The container is actually still there:

There you go. By default docker ps doesn’t show you the containers in the exit status. You have to remove the container manually using its identifier:

I will not go further with using docker as it’s all you really need to start up with

A word about tags

Technically speaking, versions do not actually exist in docker images. They are in fact tags. A tag is a simple label for an image at a given point.

Images are identified with a hash value. As with IP addresses, you are not expected to recall the hash of the images you wish to use. Docker provides a mechanism to tag images much like you would use domain names instead of IP address.

For instance, 14.10 above is actually a tag, not a version. Obviously, since tags are meant to be meaningful to human beings, it’s quite sensible for Linux distributions to be tagged following the version of the distributions.

You can easily create tags for any images as we will see later on.

Let’s talk about registries

Docker images are hosted and served by a registry. Often as it’s the case in our previous example, the registry used is the public docker registry available at : https://registry.hub.docker.com/

Whenever you pull an image from a registry, by default docker pulls from that registry. However, you may query a different registry as follow:

Basically, you provide the address of your registry and a path at which the image can be located. It has a similar form to an URI without the scheme.

Note that, as of docker 1.3.1, if the registry isn’t served over HTTPS, the docker client will refuse to download the image. If you need to pull anyway, you must add the following parameter to the docker daemon when it starts up.

Please refer to the official documentation to learn more about this.

A base Linux Python-ready container

Traditionnaly deploying CherryPy application has been done using a simple approach:

  • Package your application into an archive
  • Copy that archive onto a server
  • Configure a database server
  • Configure a reverse proxy such as nginx
  • Start the Python process(es) to server your CherryPy application

That last operation is usually done by directly calling nohup python mymodule.py &. Alternatively, CherryPy comes with a handy script to run your application in a slightly more convenient fashion:

This runs the Python module mymodule as a daemon using the given configuration file. If the -P flag isn’t provided, the module must be found in PYTHONPATH.

The idea is to create an image that will serve your application using cherryd. Let’s see how to setup an Ubuntu image to run your application.

First we create a user which will not have the root permissions. This is a good attitude to follow:

Next, we install a bunch of libraries that are required to deploy some common Python dependencies:

Then we create a virtual environment and install Python packages into it:

These are common packages I use. Install whichever you require obviously.

As indicated by Tony in the comments, it is probably overkill to create a virtual environment in a container since, the whole point of a container is to isolate your process and its dependencies already. I’m so used to using virtual env that I automatically created one. You may skip these steps.

Those operations were performed as the root user, let’s make the web user those packages owner.

Good. Let’s switch to that user now:

At this stage, we have a base image ready to support a CherryPy application. It might be interesting to tag that paricular container as a new image so that we can use it various contexts.

We take the docker container and we commit it as a new image. We then tag the new created image to make it easy to reuse it later on.

Let’s see if it worked. Exit the container and start a new container from the new image.

Well. We are ready to play now.

Run a CherryPy application in a docker container

For the purpose of this article, here is our simple application:

Two important points:

  • You must make sure CherryPy listens on the eth0 interface so just make it listen on all the container interfaces. Otherwise, the CherryPy will listen only on 127.0.0.1 which won’t be reachable from outside the container.
  • Do not start the CherryPy engine yourself, this is done by the cherryd command. You must simply ensure the application is mounted so that CherryPy can serve it.

Save this piece of code into your container under the module name: server.py. This could be any name, really. The module will be located in /home/web.

You can manually test the module:

The second line tells us the IPv4 address of this container. Next point your browser to the following URL: http://localhost:9090/

“What is this magic?” I hear you say!

If you look at the command we use to start the container, we provide this bit: -p 9090:8080. This tells docker to map port 9090 on the host to port 8080 on the container alllowing for your application to be reached from the outside.

And voilà!

Make the process a little more developer friendly

In the previous section, we saved the application’s code into the container itself. During development, this may not be practical. One approach is to use a volume to share a directory between your host (where you work) and the container.

You can then work on your application and the container will see those changes immediatly.

Automate things a bit

The previous steps have shown in details how to setup an image to run a CherryPy application. Docker provides a simple interface to automate the whole process: Dockerfile.

A Dockerfile is a simple text file containing all the steps to create an image and more. Let’s see it first hand:

Create a directory and save the content above into a file named Dockerfile. Create a subdirectory called webapp and store your server.py module into it.

Now, build the image as follow:

Use whatever tag suits you. Then, you can run a container like this:

That’s it! A docker container running your CherryPy application.

In the next articles, I will explore various options to use docker in a web application context. Follow ups will also include an introduction to weave and coreos to clusterize your CherryPy application.

In the meantime, do enjoy.