Thoughts on technical strategies for an OpenData world
I’ve been part of a very interesting discussion on a French forum around some technical aspects of the opening of public data. To provide a bit of context, the French city of Rennes, Brittany, has been leading the way in France in 2010 on the subject of opening its data [fr]. The city has released local authorities and transport datasets. The former are cold datasets served as structured files while the latter is a mix of hot and cold datasets provided through a dedicated API [fr].
The whole discussion started from that API which is basically a RPC web service that returns data such as:
- Current number of available bikes at a public bike station
- Current number of free slots to park a bike at a station
- Bus and subway transit status
- Location of travel card/ticket vendors
Someone on the forum [fr] wondered why the API wasn’t following a more RESTful approach to which I chimed in backing up that idea. Then the discussion went off whether or not REST would bring any added value in comparison to the RPC web service in place. That got me thinking about one side of OpenData that is not too often put ahead, naming the “how”. Indeed, mostly, discussions around OpenData deal with the kind of data to release, the reason why and under which conditions.
However it should not be forgotten that releasing public data isn’t a trivial operation, even when local authorities and other public bodies do have datasets at hand.
Data source constraints
By talking to folks working in public agencies you quickly realise that they meet many issues even when they are determined to open up their datasets.
Old information system or none at all
It’s not uncommon that local authorities don’t have a proper information system. They may deal with documents such as Excel files or their system is just not designed to handled some modern operations.
Information System Lock-in
In some cases, namely with GIS, local authorities don’t even really have a way to perform export operations. When that happens it makes the release process more complicated and delay it.
Unfit data model
We often cry for more data to be opened but we fail to understand that the model to publicly expose data isn’t necessarily the one used internally by local authorities.
Data provider strategies
One critical question local authorities must ask themselves is whether or not they want to engage into the data/web-service game. In other words, do they want to give an access, as raw as possible to their data and be left alone, or do they want to have control and perhaps create something richer. This is not a light question as the implications can seriously impact the level of commitment a local authority will have to show.
Flat data provider
In that configuration the local authority decides to provide the lowest level of features by putting files (flat or structured) on the web for anyone to download. The advantages is the simplicity of deployment and maintenance as well as a very low cost. However this might also create a barrier for non technical folks who would want to provide innovative services as they would be required to process those files first.
This is how Rennes has done for its GIS datasets and, indeed, everyone I met had to perform operations on those files before they could start on their services.
Raw data web service
By raw I mean a web service that stays rather close of the dataset it serves. The advantage is the rather simplicity to setup such a service and the low cost it requires. In some cases such a web service might be just enough if the data model exposed by the web service doesn’t force the developer to create his own intermediary datamodel since, in that case, the web service isn’t better than flat files.
The city of Rennes uses that kind of service for some of its transit data.
Rich web service
Here the provider decides to go the extra mile by offering richer web service. In that case, the local authority defines a much richer data model on top of its own internal data structures. The big advantage is that some common functionalities are provided by the service and allows the developer to build much richer applications too. For instance rather than solely providing a GTFS file for its public transportation a developer must download, parse and create his own model, a provider might offer an API to query that dataset for useful information such as “what is the fastest way to go from A to B whilst avoiding traffic and with the lowest impact on the environment?”.
The city of Rennes doesn’t provide a rich web service which can be explained by the fact that defining a rich model requires you have a good idea of how the data may be exploited. If you take some of Google’s APIs, it has taken some time before they reached their current matureness. Moreover Rennes follows the common pattern of “release often, release early” and the current API was released so that developers could rapidly start playing with it.
This is an aspect that is almost never approached. Should a local authority be accountable for data integrity, updates, support? The question is rather complex since it begs the question whether or not it’s within the local authority’s role to support those functions.
Implicitly this also leads to question whether or not a local authority should become a full fledged data provider. Should we start creating dedicated public, or even private, bodies to perform those tasks?
Which strategy to follow?
Eventually the decision is made based on various factors such as the kind of data a local authority owns, how it’s stored, its accessibility but also a question of cost, time and politics. I strongly believe that not thinking enough about how it should be distributed will impair the way the OpenData movement is perceived on the long run. Indeed, let’s not forget that, at least in France, the biggest consumer of public data are public services themselves and to win the OpenData uptake, the “how” must be carefully designed and implemented.
If public data are too complex to consume and process, OpenData will partly fail I believe.