ws4py – WebSocket client and server library for Python

Recently I released ws4py, a package that provides client and server WebSocket support for Python 2.6 and 2.7.

Let’s first have a quick overview of what ws4py offers for now:

  • WebSocket specification draft-10 of the current specification.
  • A threaded client. This gives a simple client that doesn’t require an external dependency.
  • A Tornado client. This client is based on Tornado 2.0 which is quite a popular way of running asynchronous networking code these days. Tornado provides its own server implementation so I didn’t include mine in ws4py.
  • A CherryPy extension so that you can integrate WebSocket from within your CherryPy 3.2.1 server.
  • A gevent server based on the popular gevent library. This is courtesy of Jeff Lindsay.
  • Based on Jeff’s work, a pure WSGI middleware as well (available in the current master branch only until the next release).
  • ws4py runs on Android devices thanks to the SL4A package

Hopefully more client and servers will be added along the way as well as Python 3.x support. The former should be rather simple to add due to the way I designed ws4py.

The main idea is to make a distinction between the bytes provider and the bytes processing. The former is essentially reading and writing bytes from the connected socket. The latter is the function of making something out of the received bytes based on the WebSocket specification. In most implementations I have seen so far, both are rather heavily intertwined making it difficult to use a different bytes provider.

ws4py tries a different path by relying on a great feature of Python: the possibility to send data back to a generator. For instance, the frame parsing yields the quantity of bytes each time it needs more and the caller feeds back the generator those bytes once they are received. In fact, the caller of a frame parser is a stream object which acts the same way. The caller of that stream object is in fact the bytes provider (a client or a server). The stream is in charge of aggregating frames into a WebSocket message. Thanks to that design, both the frame and stream objects are totally unaware of the bytes provider and can be easily adapted in various contexts (gevent, tornado, CherryPy, etc.).

On my TODO list for ws4py:

  • Upgrade to a more recent version of the specification
  • Python 3.x implementation
  • Better documentation, read, write documentation.
  • Better performances on very large WebSocket messages

Acceptance testing a CherryPy application with Robot Framework

I recently received the Python Testing Cookbook authored by Greg L. Turnquist and was happy to read about recipes on acceptance testing using Robot Framework. We’ve been using this tool at work for a few weeks now with great results. Greg shows how to test a web application using the Selenium Library extension for Robot Framework and I thought it’d be fun to demonstrate how to test a CherryPy application following his recipe. So here we go.

First some requirements:

Let’s define a simple CherryPy application, which displays a input text where to type a message. When the submit button is pressed, the message is sent to the server and returned as-is. Well it’s an echo message really.

Save the code above in a module named myapp.py

Next, we create an extension to Robot Framework that will manage CherryPy. Save the following in a module CherryPyLib.py. It’s important to respect that name since Robot Framework expects the module and its class to match in names.

Note that we start and stop the CherryPy server during the test itself, meaning you don’t need to start it separately. Pure awesomeness.

Finally let’s write a straightforward acceptance test to validate the overall workflow of echoing a message using our little application.

Save the test above into a file named testmyapp.txt. You can finally run the test as follow:

This will start CherryPy, Selenium’s proxy server and Firefox within which the test case will be run. Easy, elegant and powerful.

Hosting a Django application on a CherryPy server

Recently at work I’ve had the requirement to host a Django application in a CherryPy server. I first looked for various projects I knew were doing just that. Unfortunately, after trying them I was rather disapointed. Their approach is to provide a command similar to the famous Django runserver‘s one but I’ve found it to be more complex than necessary. So I wrote my own module that performs those operations by staying much closer to how CherryPy does work, most specifically by using the process bus coming with CherryPy.

I’m sharing a stripped down version of the module I wrote which shows how one could host a Django application in a CherryPy server. Hopefully this might help some of you.

You can find the code along side a minimal Django application showing how this works here (BSD licence). I used Django 1.3 to generate a default project but the code above works well with older version of Django.

Edit 16/03/2012: Thanks to Damien Tougas, I’ve wrapped up a better recipe for hosting a Django application into a CherryPy application server.

Should we assess OpenData innovation and impacts?

Yesterday, I was at a talk titled: “OpenData, basis for a new political technologies ?” at la Cantine Numérique Rennaise [fr], a place about the digital age located in Rennes. During the debate, I asked how we could assess the impacts of OpenData without some sort of measuring instruments. This is question the EU asked itself in a recent report.

Xavier Crouan, who has been digital innovative and Informations director in Rennes for the past few years and has communicated extensively on OpenData, made a comment that I felt was a misunderstanding of my own question. He roughly stated that it felt typically French to request for tools, indicators whenever risks and innovation were taken. He believed this was saddening to hear French engineers being so grounded and felt innovation should not have to justify itself.

Honestly that wasn’t what I was leading at. The discussion at that point of the debate was about how OpenData would eventually make a difference in people’s life politically as much as economically. In that context, it seemed sensible to ask how we could measure the impacts of OpenData so that we could tweak, tune, improve its usage.

Now in regards to innovation itself, I believe you usually need simple indicators to gauge whether or not you’re walking onto a fruitful path.

For instance, Rennes has held a contest for building applications on data it has recently opened. Xavier Crouan has indicated that 2000 people had voted. One might consider it is an indicator whether or not the contest was publicly a success and if not, how to tune it if there’s another contest next year.

Shooting in different directions in hope one path will lead to strong innovation is shortsighted in my book. You need to define a few criteria that will assess how each direction fares. This is what OpenData promotes too: improving efficiency in reusing of public sector data.

Innovation is not incompatible with retrospective.

Thoughts on technical strategies for an OpenData world

I’ve been part of a very interesting discussion on a French forum around some technical aspects of the opening of public data. To provide a bit of context, the French city of Rennes, Brittany, has been leading the way in France in 2010 on the subject of opening its data [fr]. The city has released local authorities and transport datasets. The former are cold datasets served as structured files while the latter is a mix of hot and cold datasets provided through a dedicated API [fr].

The whole discussion started from that API which is basically a RPC web service that returns data such as:

  • Current number of available bikes at a public bike station
  • Current number of free slots to park a bike at a station
  • Bus and subway transit status
  • Location of travel card/ticket vendors

Someone on the forum [fr] wondered why the API wasn’t following a more RESTful approach to which I chimed in backing up that idea. Then the discussion went off whether or not REST would bring any added value in comparison to the RPC web service in place. That got me thinking about one side of OpenData that is not too often put ahead, naming the “how”. Indeed, mostly, discussions around OpenData deal with the kind of data to release, the reason why and under which conditions.

However it should not be forgotten that releasing public data isn’t a trivial operation, even when local authorities and other public bodies do have datasets at hand.

Data source constraints

By talking to folks working in public agencies you quickly realise that they meet many issues even when they are determined to open up their datasets.

Old information system or none at all

It’s not uncommon that local authorities don’t have a proper information system. They may deal with documents such as Excel files or their system is just not designed to handled some modern operations.

Information System Lock-in

In some cases, namely with GIS, local authorities don’t even really have a way to perform export operations. When that happens it makes the release process more complicated and delay it.

Unfit data model

We often cry for more data to be opened but we fail to understand that the model to publicly expose data isn’t necessarily the one used internally by local authorities.

Data provider strategies

One critical question local authorities must ask themselves is whether or not they want to engage into the data/web-service game. In other words, do they want to give an access, as raw as possible to their data and be left alone, or do they want to have control and perhaps create something richer. This is not a light question as the implications can seriously impact the level of commitment a local authority will have to show.

Flat data provider

In that configuration the local authority decides to provide the lowest level of features by putting files (flat or structured) on the web for anyone to download. The advantages is the simplicity of deployment and maintenance as well as a very low cost. However this might also create a barrier for non technical folks who would want to provide innovative services as they would be required to process those files first.

This is how Rennes has done for its GIS datasets and, indeed, everyone I met had to perform operations on those files before they could start on their services.

Raw data web service

By raw I mean a web service that stays rather close of the dataset it serves. The advantage is the rather simplicity to setup such a service and the low cost it requires. In some cases such a web service might be just enough if the data model exposed by the web service doesn’t force the developer to create his own intermediary datamodel since, in that case, the web service isn’t better than flat files.

The city of Rennes uses that kind of service for some of its transit data.

Rich web service

Here the provider decides to go the extra mile by offering richer web service. In that case, the local authority defines a much richer data model on top of its own internal data structures. The big advantage is that some common functionalities are provided by the service and allows the developer to build much richer applications too. For instance rather than solely providing a GTFS file for its public transportation a developer must download, parse and create his own model, a provider might offer an API to query that dataset for useful information such as “what is the fastest way to go from A to B whilst avoiding traffic and with the lowest impact on the environment?”.

The city of Rennes doesn’t provide a rich web service  which can be explained by the fact that defining a rich model requires you have a good idea of how the data may be exploited. If you take some of Google’s APIs, it has taken some time before they reached their current matureness. Moreover Rennes follows the common pattern of “release often, release early” and the current API was released so that developers could rapidly start playing with it.

Data support

This is an aspect that is almost never approached. Should a local authority be accountable for data integrity, updates, support? The question is rather complex since it begs the question whether or not it’s within the local authority’s role to support those functions.

Implicitly this also leads to question whether or not a local authority should become a full fledged data provider. Should we start creating dedicated public, or even private, bodies to perform those tasks?

Which strategy to follow?

Eventually the decision is made based on various factors such as the kind of data a local authority owns, how it’s stored, its accessibility but also a question of cost, time and politics. I strongly believe that not thinking enough about how it should be distributed will impair the way the OpenData movement is perceived on the long run. Indeed, let’s not forget that, at least in France, the biggest consumer of public data are public services themselves and to win the OpenData uptake, the “how” must be carefully designed and implemented.

If public data are too complex to consume and process, OpenData will partly fail I believe.

WebSocket for CherryPy 3.2

Just a quick note about the first draft of support for WebSocket in CherryPy. You can find the code here.

Note that this is still work in progress but does work against Chrome and the pywebsocket echo client. It supports draft-76 of the specification only and I’m waiting for the working-group to settle a bit more before making any further modification.

The updated code has started integrating draft-06 as well but this is a work in progress.

Running CherryPy on Android with SL4A

CherryPy runs on Android thanks to the SL4A project. So if you feel like running Python and your own web server on your Android device, well you can just do so. You’ve probably not heard something that awesome since the pizza delivery guy rung the door.

How to get on about it? Well that’s the surprise, CherryPy in itself doesn’t need to be patched. Granted I haven’t tried all the various tools provided by CherryPy but the server and the dispatching works just fine.

First, you need get the CherryPy source code, build and copy the resulting cherrypy package into the SL4A scripts directory.

Once you’ve plugged your phone to your machine through USB, run the next commands:

Just change the path to match your environment. That’s it.

Now you can copy your own script, let’s assume you use something like below:

As you can see we must disable the multiprocessing logging since the multiprocessing package isn’t included with SL4A.

Save that script on your computer as cpdroid.py for example. Copy that file into the scripts directory of SL4A.

Unplug your phone and go to the SL4A application. Click on the cpdroid.py script, it should start fine. Then from your browser, go to http://phone_IP:8080/ and tada! You can also go to the /location path to get the geoloc of your phone.

Integrating SQLAlchemy into a CherryPy application

Quite often, people come on the CherryPy IRC channel asking about the way to use SQLAlchemy with CherryPy. There are a couple of good recipes on the tools wiki but I find them a little complex to begin with. Not to the recipes’ fault, many people don’t necessarily know about CherryPy tools and plugins at that stage.

The following recipe will try to make the example complete whilst as simple as possible to allow folks to start up with SQLAlchemy and CherryPy.

The general idea is to use the plugin mechanism to register functions on an engine basis and enable a tool that will provide an access to the SQLAlchemy session at request time.

Using Jython as a CLI frontend to HBase

HBase, the well known non-relational distributed database, comes with a console program to perform various operations on a HBase cluster. I’ve personally found this tool to be a bit limited and I’ve toyed around the idea of writing my own. Since HBase only comes with a Java driver for direct access and the various RPC interfaces such as Thrift don’t offer the full set of functions over HBase, I decided to go for Jython and to directly use the Java API. This article will show a mock-up of such a tool.

The idea is to provide a simple Python API over the HBase one and couple it with a Python interpreter. This means, it offers the possibility to perform any Python (well Jython) operations whilst operating on HBase itself with an easier API than the Java one.

Note also that the tool uses the WSPBus already described in an earlier article to control the process itself. You will therefore need CherryPy’s latest revision.

To test the tool, you can simply grab the latest copy of HBase and run:

Then you need to configure your classpath so that it includes all the HBase dependencies. To determine them:

Copy the full list of jars and export CLASSPATH with it. (This is from the HBase wiki on Jython and HBase).

Next you have to add an extra jar to the classpath so that Jython supports readline:

Make sure you’ll install libreadline-java as well.

Now, that your environment is setup, save the code above under a script named stave.py and run it as follow:

You can import any Python module available to your Jython environment as well of course.

I will probably extend this tool over time but in the meantime I hope you’ll find it a useful canvas to operate HBase.

A quick chat WebSockets/AMQP client

In my previous article I described how to plug WebSockets into AMQP using Tornado and pika. As a follow-up, I’ll show you how this can be used to write the simplest chat client.

First we create a web handler for Tornado that will return a web page containing the Javascript code that will connect and converse with our WebSockets endpoint following the WebSockets API.

Every time, the user enters a message and submits it too our WebSockets endpoint which, in return, will forward any messages back to the client. These will be appended to the textarea.

Internally, each client gets notified of any message through AMQP and the bus. Indeed the WebSockets handler are subscribed to a channel that will be notified every time the AMQP server pushes data to the consumer. A side effect of this is that the Javascript code above doesn’t update the textarea when it sends the message the user has entered, but when the server sends it back.

Let’s see how we had to change the Tornado application to support that handler as well as the serving of jQuery as a static resource (you need the jQuery toolkit in the same directory as the Python module).

The code is here.

Once the server is running, open two browser windows and access http://localhost:8888/. You should be able to type messages in one and see them appears in both windows.

Note:

This has been tested against the latest Chrome release. You will need to either set the “localdomain.dom” or provide the IP address of your network interface in the Javascript above since Chrome doesn’t allow for localhost nor 127.0.0.1.