ODM Response to the Public Data Corporation consultation

Charging for Public Data Corporation information

1. How do you think Government should best balance its objectives around increasing access to data and providing more freely available data for re-use year on year within the constraints of affordability? Please provide evidence to support your answer where possible.

This question is framed incorrectly. For open data to be truly sustainable there has to be a shift away from the notion of affordability and access. Open Data is part of a transformation of how services are delivered within and by government and how government relates to people and business. What we should be moving to is the notion of Government as platform where the data that the government uses for its own purposes is also seamlessly available for reuse.

2. Are there particular datasets or information that you believe would create particular economic or social benefits if they were available free for use and re-use? Who would these benefit and how? Please provide evidence to support your answer where possible.

We see that there are a number of core ‘infrastructure’ datasets that have allowed systems to be developed within the UK. The majority being run by trading funds. Consolidating their charging position within the PDC will have a chilling effect not only on the direct creation of applications and services but on an underlying data ecosystem that will create social and economic value. It has impact on future technological developments where applications need to be aware of their relation to core data infrastructure. This is particularly important with the emerging development of the Internet of Things and pervasive technologies.
Whilst developing the Open Data Cities project in 2009 and DataGM – The Greater Manchester Datastore with Trafford Council it became apparent that local authority and community access to certain data such as Land Registry data was creating problems. Anecdotally it had been suggested that easy and open access to Land Registry data would help combat cross boundary housing benefit fraud and would of eliminated the MPs second home scandal.

3. What do you think the impacts of the three options would be for you and/or other groups outlined above? Please provide evidence to support your answer where possible.
The charging options outlined will all have impact on the development of open data services/applications and future technologies where open data is an enabler.
All three models are flawed in that they are trying to predict and extract value from an emergent field. They fail to take into account what is needed to create a sustainable, innovative and disruptive data ecosystem. Disruptive innovation in emerging fields needs to have a low barrier to entry and the creation of an ecosystem where ideas can be tested, fail and succeed with marginal cost.

4. A further variation of any of the options could be to encourage PDC and its constituent parts to make better use of the flexibility to develop commercial data products and services outside of their public task. What do you think the impacts of this might be?
By encouraging public organisations to develop services outside the public task has the potential to distort an emerging market and should be treated with caution. The knowledge that many public organisations hold in regard to their task is unique and could be encouraged as long as the underlying raw data resources are available to all.

5. Are there any alternative options that might balance Government’s objectives which are not covered here? Please provide details and evidence to support your response where possible.

5. There needs to be an appraisal of the wider value and impact of releasing public data. This impact should not just be seen as a simple transactional value but a broader impact on the engagement and wellbeing of society.

Licensing
 
1. To what extent do you agree that there should be greater consistency, clarity and simplicity in the licensing regime adopted by a PDC?
It is understood that having multiple licensing regimes can create confusion and hence hinder the development of interpretations, applications and services. The danger of ‘double licensing’ is real especially as products become more complex. The adoption of OGL should be seen as a default position for raw open public data. At the moment within public datastores such as DataGM there are numerous licensing options most with a potential to cause confusion and contaminate downstream data usage. This confusion has also been used as an excuse for not releasing data.

2. To what extent do you think each of the options set out would address those issues (or any others)? Please provide evidence to support your comments where possible.
The potential impact of different organisations within the PDC to define their own licenses to suit different uses of data usage presupposes that the data provider has an appreciation of the potential uses of the data. This may work in an environment where products are developed in one specific domain but when innovation is cross cutting the need for standardisation and clarity becomes clear. Whilst the third option of a single PDC licence with adapted schedules of use would seem easiest. The question fails to recognise that raw open public data should be free by default with exemptions being rigorously justified.

3. What do you think the advantages and disadvantages of each of the options would be? Please provide evidence to support your comments
Please see above

4. Will the benefits of changing the models from those in use across Government outweigh the impacts of taking out new or replacement licences?
Yes, as the current licensing regime is opaque and hinders innovation and innovation drives the economy.

Oversight

1. To what extent is the current regulatory environment appropriate to deliver the vision for a PDC?
You cant have a system of oversight which fails to engage users. It is necessary to have one robust and representative regulatory environment that has real powers to make PDC based organisations compliant. The representation should be a balance of suppliers and users of data.

2. Are there any additional oversight activities needed to deliver the vision for a PDC and if so what are they?
Apart from making sure that raw public data is made open and freely available, No

3. What would be an appropriate timescale for reviewing a PDC or its constituent parts public task(s)?
Six monthly initially then after the initiative becomes embedded less often

Open Data in Manchester: Challenges and Opportunities

This blog post was originally written for the Open Knowledge Foundation blog.

Open Data Cities was initiated in May 2009, premised on the simple question of how cities would evolve if all data were made open. Would the same inequalities and asymmetries persist for example? Moreover, what would need to happen within the city to bring about the adoption of more open and transparent practices?

Greater Manchester is a region in the North West of England with a population of 2.8 million people. It comprises of 10 boroughs containing two cities and many large towns. Open Data Cities approached the city as a functioning organism comprising of these 10 boroughs. For the project to have a genuine impact with its inhabitants, we proposed that the project would need to align with how people used the city rather than the ways in which the city was administered. The reality within the city is that although people access services across authorities and whilst there are a number of pan-Greater Manchester public bodies, local authorities still deliver services to tight geographical boundaries.

Addressing the whole Greater Manchester region in this way, created an environment that allowed the project to evolve in a particular way. As the region was adopting City Region status this would require a certain alignment in terms of data and information. The granting of City Region status also opened up the possibility of bringing about an elected mayor, enabling, theoretically, a coherent region-wide strategy to be implemented.

Working across the ten boroughs – all with their own democratically elected councils is not without its challenges. Each public body has its own administrative and data structure and specific set of difficulties. It was therefore necessary to adopt a pragmatic, non-threatening approach as part of our project. Conversations therefore centered around the idea of allowing citizens to look ‘under the hood’ of public service so to speak, of creating better understanding of what councils do. Most importantly we were interested in rebalancing the relationship between public service and citizen and the possibility for services to be delivered with citizens rather than simply to citizens.

Communicating The Benefits

We were often challenged as to how the release of data would benefit the person on the street and who would create the applications and interpretations to allow this to happen. At the start of the Open Data Cities project the Open Data Manchester community was formed to provide evidence that there was indeed a ‘demand’ for the release of open data within the region. We argued that by giving people the tools to understand and act within communities, open data would have broader benefits too. Moreover, there was a growing acceptance that enabling people to access the data and information relevant to their locality was important. This in part has been born out by the emergence of hyperlocal blogging as a means of disseminating news and information at a community level.

Open Data Cities also strongly emphasised the innovation and economic benefits such open data could bring to the region. Opening up the ‘undiscovered country’ of open data, could kick start an economy based on the creation of data services. We had seen examples where companies such as Elbatrop software in London had created best selling applications for San Francisco based on released tree data. If Greater Manchester released data this could present an opportunity for developers to create applications that could have relevance beyond the Greater Manchester region. Research had identified that open data could add £6 billion of added value to the UK economy, how much of that value could be injected into the regional economy?

High value, ‘quality of life’ datasets were identified. Greater Manchester Passenger Transport Executive now TfGM, made the decision to release large and regularly updated datasets. This sparked a number of good applications but most of them were ‘proof of concept’ with little that could really be considered ready for market. This wasn’t the ‘release the data and people will build cool stuff’ future that we had been promoting, and even though the transport authority had now committed to making data open as a default position, they were very aware that not much was being built.

Acknowledging the Barriers

By talking to people who were involved in Open Data Manchester and the wider Greater Manchester digital community, it became apparent that although open data offered opportunities, there were a number of significant barriers that were inhibiting the development of services. These could be seen as return on investment, risk and liability.

The return on investment argument was quite apparent from early on. People have to make a living and generally want to see their efforts rewarded. By Open Data Cities embracing the 2.8 million people of Greater Manchester it was hoped that there would be enough people to sustain a market in Open Data application development. In order to kickstart this market it was proposed that a number of innovation challenges with sizeable incentives should take place.

It was obvious that there were no large digital businesses in the open data space and we had long held the view that their presence would be an indicator of the health of the open data innovation ecosystem. A suggested reason for the scarcity being that open data licensing was transferring all the risk on to the developer, whereas previously data would be generally released with some sort of service level agreement, none of these guarantees exist with open licensing. The idea of spending large amounts of development time on applications built on data that could then be turned off was deemed too risky.

Liability was also an issue. Who would be liable if someone had bought an application where the data was suddenly turned off or were inaccurate? There were also concerns as to the robustness of supplied data and the sometimes, archaic formats data were supplied in. The liability argument was also been put forward as a ‘supply side’ reason for non-disclosure both from a robustness of data and command and control perspective.

Collaboration

When FutureEverything and Trafford Council began working together on DataGM – The Greater Manchester Datastore, many of the local authorities were in a state of panic through having to negotiate the drastic shortfalls in budget. It was becoming apparent that innovation and citizen empowerment, although appealing were the least of concerns. Public bodies are still in a time of fiscal stress and it has been stated that few, if any, public bodies innovate out of a crisis.

All Greater Manchester local authorities and most pan-Greater Manchester public bodies are represented on the datastore steering group – The benefit of having a local authority leading the project, is their ability to get people around the table. Whilst some members of the group understood the logic of having a datastore and shared intelligence, there was a lot of resistance. Members stated despair at being involved with a project where they didn’t know if they would still be in post in three months time, with others not seeing the point of spending time and resource on something that didn’t have concrete output. There was also a very tangible silo mentality where the idea of shared intelligence across authorities was seen as attractive but not essential.

Evidence and Evolution

As the DataGM project gathered momentum more evidence started to emerge as to the inefficiencies of maintaining a siloed and closed data culture. The servicing of Freedom of information requests costs Greater Manchester public bodies over £4 million a year, over 600 public officials a day are unable to find or use data that they require in order to carry out their jobs – costing authorities over £8.5 million a year. The annoying tendency – for public bodies – of citizens using services outside their borough boundaries also creates difficulties. With no pan – Greater Manchester data initiative it is difficult for public bodies to create and deliver on coherent regional strategies. Open data offers a solution.

Now DataGM is becoming established the economic logic of using a centralised data catalogue, where the data that local authorities use themselves is openly available, is starting to make sense. Open data needs to be transformational. For public bodies enhanced engagement and the creation of innovative services are not enough. We are at a stage where we are saying if you spend A you will get savings of B and with open data you will also gain benefits of C, D, E…

DataGM is starting to develop data release schedules so that local authorities can release similar data in a coordinated way. With developers such as Swrrl – one of the recent winners of the EU Open Data Challenge, some of that data is being expressed as Linked Data. The Open Data Manchester community continues to grow. Although there is still a long way to go with open data in Manchester it feels like more people within public service are starting to see the benefits, and the possibility of Greater Manchester becoming an Open Data City gets closer.

Licensing – Why it is so important

This blog post originally was originally written for FutureEverything as part of their Open Data Cities programme.

I’m no expert but I really need to be – Licensing

Licensing is a subject that comes up a lot with Open Data. The licence is a key component of the dataset. It defines the use and liability and it shapes how or what innovation will come from data release.

As mentioned in the title I am no expert in this area and I would appreciate any correction or amendments to my understanding.

Traditionally public data has been closed so that the only way you could get access to data to build products was by buying a licence to use. In many cases these licences were expensive and restrictive. The to mitigate this cost often, te licence would also have some level of service agreement built in. You paid for the licence for the data and the data provider would provide you with a level of continuity and support. This helps to limit risk and encourage investment into a product.

The closed ‘paid licence’ system generally has a high barrier to entry ‘price of licence’ limiting the amount of innovative products developed. If innovation ecosystems are ideas that live with most failing. The price of failure being too high could have a chilling effect on the whole system.

One of the first licenses used for the release of Open Data was Creative Commons CC-BY-SA. This licence allowed people to create services and products off the back of the data as long as they attribute where the data came from and share back any data that was created off the back off the originally released dataset (value-added data). The original Creative Commons licenses were devised as an answer to restrictive copyright laws relating to ‘works’ – articles, text, images, music etc., as these were deemed increasingly anachronistic in the digital age. It is up for discussion if data can be deemed as a ‘work’ in the context of this licence.

The Open Database Licence (ODbL) developed by Open Data Commons, was created to address the doubt that data could be seen as a ‘work’. It carries the same attribution and share alike clauses and is used by many datastores including the newly opened Paris Datastore.

Anyone can develop products and services that use datasets with these licences but intellectual property doesn’t extend to the value-added datasets created in the process of developing these products. Releasing value-added datasets back to the community allows further innovative products to be released off the back of these datasets, so potentially the pace of innovation could be increased – It is analogous to the ‘standing on the shoulders of giants’ idea.

By imposing further use of value-added data by other organisations might chill the development of products that create value-added data.

With the above licences there is generally no liability or guarantee of service from data providers. This creates a greater risk scenario. If you were investing in product development this potentially is a source of concern and may be an inhibiting factor

In the UK we have the recently released Open Government Data Licence. That was developed specifically for government data. It borrows from some aspects of the CC-BY-SA licence and ODbL. Unlike the those licences there is no need to share back value-added data.

Would this have any impact on products and services that are developed from Open Data? Again in the licence there is no liability or guarantee of service from the data provider but the developing organisation gets to keep all the rights on the products and services they develop – including value-added datasets.
The advantage of this could be that by allowing people to keep hold of the rights to the products that they develop might be mitigate against the exposed risk posed by the lack of liability and guarantee. The main disadvantage could be that the pace of innovation could be curtailed due to people having to replicate process and value-added datasets.