Should we assess OpenData innovation and impacts?

Yesterday, I was at a talk titled: “OpenData, basis for a new political technologies ?” at la Cantine Numérique Rennaise [fr], a place about the digital age located in Rennes. During the debate, I asked how we could assess the impacts of OpenData without some sort of measuring instruments. This is question the EU asked itself in a recent report.

Xavier Crouan, who has been digital innovative and Informations director in Rennes for the past few years and has communicated extensively on OpenData, made a comment that I felt was a misunderstanding of my own question. He roughly stated that it felt typically French to request for tools, indicators whenever risks and innovation were taken. He believed this was saddening to hear French engineers being so grounded and felt innovation should not have to justify itself.

Honestly that wasn’t what I was leading at. The discussion at that point of the debate was about how OpenData would eventually make a difference in people’s life politically as much as economically. In that context, it seemed sensible to ask how we could measure the impacts of OpenData so that we could tweak, tune, improve its usage.

Now in regards to innovation itself, I believe you usually need simple indicators to gauge whether or not you’re walking onto a fruitful path.

For instance, Rennes has held a contest for building applications on data it has recently opened. Xavier Crouan has indicated that 2000 people had voted. One might consider it is an indicator whether or not the contest was publicly a success and if not, how to tune it if there’s another contest next year.

Shooting in different directions in hope one path will lead to strong innovation is shortsighted in my book. You need to define a few criteria that will assess how each direction fares. This is what OpenData promotes too: improving efficiency in reusing of public sector data.

Innovation is not incompatible with retrospective.

Thoughts on technical strategies for an OpenData world

I’ve been part of a very interesting discussion on a French forum around some technical aspects of the opening of public data. To provide a bit of context, the French city of Rennes, Brittany, has been leading the way in France in 2010 on the subject of opening its data [fr]. The city has released local authorities and transport datasets. The former are cold datasets served as structured files while the latter is a mix of hot and cold datasets provided through a dedicated API [fr].

The whole discussion started from that API which is basically a RPC web service that returns data such as:

  • Current number of available bikes at a public bike station
  • Current number of free slots to park a bike at a station
  • Bus and subway transit status
  • Location of travel card/ticket vendors

Someone on the forum [fr] wondered why the API wasn’t following a more RESTful approach to which I chimed in backing up that idea. Then the discussion went off whether or not REST would bring any added value in comparison to the RPC web service in place. That got me thinking about one side of OpenData that is not too often put ahead, naming the “how”. Indeed, mostly, discussions around OpenData deal with the kind of data to release, the reason why and under which conditions.

However it should not be forgotten that releasing public data isn’t a trivial operation, even when local authorities and other public bodies do have datasets at hand.

Data source constraints

By talking to folks working in public agencies you quickly realise that they meet many issues even when they are determined to open up their datasets.

Old information system or none at all

It’s not uncommon that local authorities don’t have a proper information system. They may deal with documents such as Excel files or their system is just not designed to handled some modern operations.

Information System Lock-in

In some cases, namely with GIS, local authorities don’t even really have a way to perform export operations. When that happens it makes the release process more complicated and delay it.

Unfit data model

We often cry for more data to be opened but we fail to understand that the model to publicly expose data isn’t necessarily the one used internally by local authorities.

Data provider strategies

One critical question local authorities must ask themselves is whether or not they want to engage into the data/web-service game. In other words, do they want to give an access, as raw as possible to their data and be left alone, or do they want to have control and perhaps create something richer. This is not a light question as the implications can seriously impact the level of commitment a local authority will have to show.

Flat data provider

In that configuration the local authority decides to provide the lowest level of features by putting files (flat or structured) on the web for anyone to download. The advantages is the simplicity of deployment and maintenance as well as a very low cost. However this might also create a barrier for non technical folks who would want to provide innovative services as they would be required to process those files first.

This is how Rennes has done for its GIS datasets and, indeed, everyone I met had to perform operations on those files before they could start on their services.

Raw data web service

By raw I mean a web service that stays rather close of the dataset it serves. The advantage is the rather simplicity to setup such a service and the low cost it requires. In some cases such a web service might be just enough if the data model exposed by the web service doesn’t force the developer to create his own intermediary datamodel since, in that case, the web service isn’t better than flat files.

The city of Rennes uses that kind of service for some of its transit data.

Rich web service

Here the provider decides to go the extra mile by offering richer web service. In that case, the local authority defines a much richer data model on top of its own internal data structures. The big advantage is that some common functionalities are provided by the service and allows the developer to build much richer applications too. For instance rather than solely providing a GTFS file for its public transportation a developer must download, parse and create his own model, a provider might offer an API to query that dataset for useful information such as “what is the fastest way to go from A to B whilst avoiding traffic and with the lowest impact on the environment?”.

The city of Rennes doesn’t provide a rich web service  which can be explained by the fact that defining a rich model requires you have a good idea of how the data may be exploited. If you take some of Google’s APIs, it has taken some time before they reached their current matureness. Moreover Rennes follows the common pattern of “release often, release early” and the current API was released so that developers could rapidly start playing with it.

Data support

This is an aspect that is almost never approached. Should a local authority be accountable for data integrity, updates, support? The question is rather complex since it begs the question whether or not it’s within the local authority’s role to support those functions.

Implicitly this also leads to question whether or not a local authority should become a full fledged data provider. Should we start creating dedicated public, or even private, bodies to perform those tasks?

Which strategy to follow?

Eventually the decision is made based on various factors such as the kind of data a local authority owns, how it’s stored, its accessibility but also a question of cost, time and politics. I strongly believe that not thinking enough about how it should be distributed will impair the way the OpenData movement is perceived on the long run. Indeed, let’s not forget that, at least in France, the biggest consumer of public data are public services themselves and to win the OpenData uptake, the “how” must be carefully designed and implemented.

If public data are too complex to consume and process, OpenData will partly fail I believe.