Thoughts on technical strategies for an OpenData world

I’ve been part of a very interesting discussion on a French forum around some technical aspects of the opening of public data. To provide a bit of context, the French city of Rennes, Brittany, has been leading the way in France in 2010 on the subject of opening its data [fr]. The city has released local authorities and transport datasets. The former are cold datasets served as structured files while the latter is a mix of hot and cold datasets provided through a dedicated API [fr].

The whole discussion started from that API which is basically a RPC web service that returns data such as:

  • Current number of available bikes at a public bike station
  • Current number of free slots to park a bike at a station
  • Bus and subway transit status
  • Location of travel card/ticket vendors

Someone on the forum [fr] wondered why the API wasn’t following a more RESTful approach to which I chimed in backing up that idea. Then the discussion went off whether or not REST would bring any added value in comparison to the RPC web service in place. That got me thinking about one side of OpenData that is not too often put ahead, naming the “how”. Indeed, mostly, discussions around OpenData deal with the kind of data to release, the reason why and under which conditions.

However it should not be forgotten that releasing public data isn’t a trivial operation, even when local authorities and other public bodies do have datasets at hand.

Data source constraints

By talking to folks working in public agencies you quickly realise that they meet many issues even when they are determined to open up their datasets.

Old information system or none at all

It’s not uncommon that local authorities don’t have a proper information system. They may deal with documents such as Excel files or their system is just not designed to handled some modern operations.

Information System Lock-in

In some cases, namely with GIS, local authorities don’t even really have a way to perform export operations. When that happens it makes the release process more complicated and delay it.

Unfit data model

We often cry for more data to be opened but we fail to understand that the model to publicly expose data isn’t necessarily the one used internally by local authorities.

Data provider strategies

One critical question local authorities must ask themselves is whether or not they want to engage into the data/web-service game. In other words, do they want to give an access, as raw as possible to their data and be left alone, or do they want to have control and perhaps create something richer. This is not a light question as the implications can seriously impact the level of commitment a local authority will have to show.

Flat data provider

In that configuration the local authority decides to provide the lowest level of features by putting files (flat or structured) on the web for anyone to download. The advantages is the simplicity of deployment and maintenance as well as a very low cost. However this might also create a barrier for non technical folks who would want to provide innovative services as they would be required to process those files first.

This is how Rennes has done for its GIS datasets and, indeed, everyone I met had to perform operations on those files before they could start on their services.

Raw data web service

By raw I mean a web service that stays rather close of the dataset it serves. The advantage is the rather simplicity to setup such a service and the low cost it requires. In some cases such a web service might be just enough if the data model exposed by the web service doesn’t force the developer to create his own intermediary datamodel since, in that case, the web service isn’t better than flat files.

The city of Rennes uses that kind of service for some of its transit data.

Rich web service

Here the provider decides to go the extra mile by offering richer web service. In that case, the local authority defines a much richer data model on top of its own internal data structures. The big advantage is that some common functionalities are provided by the service and allows the developer to build much richer applications too. For instance rather than solely providing a GTFS file for its public transportation a developer must download, parse and create his own model, a provider might offer an API to query that dataset for useful information such as “what is the fastest way to go from A to B whilst avoiding traffic and with the lowest impact on the environment?”.

The city of Rennes doesn’t provide a rich web service  which can be explained by the fact that defining a rich model requires you have a good idea of how the data may be exploited. If you take some of Google’s APIs, it has taken some time before they reached their current matureness. Moreover Rennes follows the common pattern of “release often, release early” and the current API was released so that developers could rapidly start playing with it.

Data support

This is an aspect that is almost never approached. Should a local authority be accountable for data integrity, updates, support? The question is rather complex since it begs the question whether or not it’s within the local authority’s role to support those functions.

Implicitly this also leads to question whether or not a local authority should become a full fledged data provider. Should we start creating dedicated public, or even private, bodies to perform those tasks?

Which strategy to follow?

Eventually the decision is made based on various factors such as the kind of data a local authority owns, how it’s stored, its accessibility but also a question of cost, time and politics. I strongly believe that not thinking enough about how it should be distributed will impair the way the OpenData movement is perceived on the long run. Indeed, let’s not forget that, at least in France, the biggest consumer of public data are public services themselves and to win the OpenData uptake, the “how” must be carefully designed and implemented.

If public data are too complex to consume and process, OpenData will partly fail I believe.

2 thoughts on “Thoughts on technical strategies for an OpenData world

  1. Dave King

    Hey Lawouach great post,

    We’re definitely seeing all kinds of different methods of releasing data locally to me (Sheffield, UK) — I almost worry that some data isn’t opened up sooner because they’d rather get it into a nicely structured API . http://data.gov.uk/ is a good example but the smaller the public authority the less structured the data.

    http://www.whatdotheyknow.com is a site which helps people make freedom of information (FOI) requests and the responses to these are documented on the site. If you take a look through some of these responses you’ll quickly see a variety of datasets formatted in a manner of different ways.

  2. Sylvain Hellegouarch Post author

    I think the release often release early mojo used in Rennes is sensible indeed but I hope the city will not rest there, they could do much better and richer.

    The national portals are definitely necessary to give a direction and provide more visibility but I feel real innovation will come at the local authority levels which are obviously closer to what citizens could use.

    I like the whatdotheyknow.com website, it’s a valuable resource to priorise which data to release.

Comments are closed.