Making Open Data Real – Response from ODM and Open Data Cities programme.

27th October 2011

The benefits of adopting open data for the purposes of transparency and accountability have been well documented, but open data is not just about transparency and accountability. We live in a modern technologised society and we need to give people the tools to navigate through our modern data driven environment, whether it be access to transit data, gritting routes or ‘infrastructural’ data such as mapping, hydrology or weather.

We strongly argue for an open by default position with exemption being justified due to security or privacy. This is key as it is virtually impossible to predict what the utility of every dataset will be. It is obvious that certain ‘high value’ (Those that are perceived to improve ‘quality of life’ decisions) datasets will be adopted and used relatively quickly, but some will get used seldomly and many not at all – this doesn’t discount their value, as data has to be seen in the broader context of knowledge and future conditions may make certain datasets more relevant.

It is also important that any body that delivers service on behalf of the public is also required to be open. For example Manchester is straight jacketed by a fragmented public transport system that has 40+ bus operators all supposedly in competition. Crossing the city may take multiple tickets from multiple operators. There is no motivation for operators to release information as to their fare structures although it has long been identified that having a transparent fare structure enables people to budget, plan and use public transport with confidence. At the moment you can only find out a fare by stepping on to the bus or ringing the operator directly. Although some bus operators do see the value of opening up this information, in meetings concern has been raised by certain operators about wholesale release of data allowing other operators to undercut prices – which is the idea of a deregulated system and local councilors being able to see how much they charge – which goes against the idea of delivering public service and being accountable.

There is a case that Land Property Registry data be made available. Speaking to Local Authority colleagues there is an issue regarding the tackling of housing benefit fraud where claimants might have property in another borough and the potential of combating certain money laundering activities – It might also of effectively tackled the abuse of second home allowances by MPs before it became a major issue.

We need to encourage a transition to a more intelligent and aware data policy. This cannot be done in one fell swoop but needs to inform procurement, so when IT systems are upgraded the ability to express data openly from a system would be specified. The adoption of common data release schedules is to be encouraged, especially where you have metropolitan counties such as Greater Manchester. Our colleagues at Trafford MBC, who we were in partnership with, in developing DataGM identified this as an important way to get cross authority collaboration on dataset release.

There is a very important benefit from having common data release schedules. At present it is very difficult for developers and digital businesses to make certain open data based applications beyond proof of concept due to the market for open data applications and services being nascent. Common schedules allow development of products that can quickly find a critical market mass, this in turn validates the demand side argument for data.

The public sector is logically the biggest user of its own data but data that is closed and siloed is often dumb data. We hear countless examples of dumb data policy: where local authority officials can’t find the data that they require – so creating an environment for ad hoc duplication and standards, in Greater Manchester this is estimated to cost many millions of pounds of lost personnel hours, and where local authorities might be operating multiple – up to 30 in some – GIS systems all with their own licensing agreements and interoperability issues.

There has to be an adoption of common standards and these have to be non-proprietary, open and extensible. Although there is certain resistance to the adoption of Linked Data, mostly due to people not fully understanding the concept and need, with the explosion of data enabled devices, the need for computers to interpret complex data environments is becoming more important. Government has to be a major player in this space it also has to be intelligent in how it ensures compliance. Open and extensible formats offer a certain amount of future proofing over proprietary formats

A concern that we hold, especially in light of participating in the EU smart city programme, is that within the UK there doesn’t seem to be much appreciation that open data is an enabler of Smart City and other technologies. Common technological frameworks that allow the development of city-based services across territories are being developed, building larger potential markets for products. What might be unviable in one territory might be viable at scale.
Future technological developments such as the Internet of Things might be hampered if there is pressure to license and charge for certain ‘infrastructure’ datasets. Certain IoT devices have to be aware of where they are and how they are functioning in relation to public infrastructure and data.

We strongly feel that we are coming to a point where we see a transition to Government as a platform. This will enable development of services from both within the public sector and outside. Open Data could be seen as evidence of a healthy functioning platform based structure, where the boundaries and interactions between citizen, government and business are porous, diffuse and bidirectional.

Access to information is key to the re-enfranchisement. Open Data has the potential to create a more equitable environment for participation. Although it would be naive to believe that opening up data will automatically create a data aware citizenry, it only needs a few people who have the skills to mediate information in their communities to raise awareness and participation.

We believe that for Open Data to become sustainable we need to be able to not only encourage the supply side but that of the demand side for data as well. Where market failure occurs or where there is nascent development of a sector, there is a need to stimulate activity to drive awareness, create services and applications and develop a base layer from which further development can be derived. Innovation challenges and focused development days are two of the things that can help drive this. There needs to be support for initiatives such as Open Data Manchester, Open Data Sheffield, Open Data Brighton and now Open Data Hull. Often, as in the case of Open Data Manchester and the Open Data Cities project from which it was derived, there is no resource support from the public sector and this is unsustainable.

Julian Tait
Open Data Manchester/Open Data Cities

Online Response

1. Do the definitions of the terms go far enough or too far

Engaged citizens need to have access to the structure of our cities. This isn’t jut about league tables but one that allows people to move seamlessly through their modern data driven environment

There needs to be an additional category of open data that focusses on the open data that enables people to navigate through the modern data driven environment, whether it be access to transit data, gritting routes or ‘infrastructural’ data such as mapping, hydrology or weather.

2. Where a decision is being taken about whether to make a dataset open, what tests should be applied

Whether the dataset or ‘datastream’ is being produced to enable the delivery of public services or as in the case of transportation data whether the data produced is for the purposes of disseminating information to the public enabling them to access service more efficiently – EG Transport Executive producing RT bus data that will enable people to use mobile devices to access service saving the capital outlay of investing in realtime bus signage.

3. If the costs to publish or release data are not judged to represent value for money, to what extent extent should the requester be required to pay for public services data and under what circumstances

The terms for value for money can be vague and encourage abuse. A test should be whether the data holder is creating the data for the delivery of their own service rather than explicitly for the request.

4. How do we get the right balance in relation to the range of organisations (providers of public services) our policy proposals apply to? What threshold would be appropriate to determine the range of public services in scope and what key criteria should inform this

All services that are delivered on behalf of the public should be covered. If a public service uses the data for the delivery of its own task then it should be made available

5. What would be appropriate mechanisms to encourage or ensure publication of data by public service providers?

We need to encourage a transition to a more intelligent and aware data policy. This can not be done in one fell swoop but needs to inform procurement. So when IT systems are upgraded the ability to express data openly from a system would have to be implemented

1. How should we establish a stronger presumption in favour of publication than that which currently exists?

Emphasis needs to be changed to one where exemption from publication is the exception and sufficient rigorous justification is needed

2. Is providing an independent body, such as the Information Commissioner, with enhanced powers and scope the most effective option for safeguarding a right to access and a right to data?

Enhancing the powers of the Information Commissioner is crucial in this process. It is also the ICO becomes a key motivator to creating an open by default policy. The ICO would then be able to put pressure on public bodies to standardise the way that they create data ideally bringing about a more intelligent public data environment

3. Are existing safeguards to protect personal data and privacy measures adequate to regulate the Open Data agenda?

Protection of personal data and privacy is vitally important and their has to be real teeth regarding organisations both public and private that transgress these rules. There also has to be an understanding that networked technologies will circumvent many safeguards

4. What might the resource implications of an enhanced right to data be for those bodies within its scope?

The enhanced right to data could if implemented wrongly be very resource heavy. If the starting position of public bodies are the biggest users of their own data and the present systems in place for shared intelligence and services is fundamentally flawed and there needs to be change. Example one local authority uses 30 separate GIS systems with each departmental head believing that theirs is the best. If you get it right for the public sectors own use the rest is easy.

5. How will we ensure that Open Data standards are embedded in new ICT contracts

Open data and open platforms need to be embedded into the procurement process. We need to break the straightjacket of public services being sold into proprietary IT contracts where the public body isn’t able to use their own data beyond the purposes originally specified. There also has to be a more intelligent procurement process where seemingly value for money initial cost is impacted by costly process of upgrading

1. What is the best way to achieve compliance on high and common standards to allow usability and interoperability?

There are a number of standards that are open, extensible and interoperable.

2. Is there a role for government to establish consistent standards for collecting user experience across public service

Government is the only authority that can establish compliance amongst public bodies.

3. Should we consider a scheme for accreditation of information intermediaries, and if so how best that might work

No. As long as there is equal access to data for all the market should be able to create the right mechanism.

1. How would we ensure that public service providers in their day to day decision-making honour a commitment to Open Data, while respecting privacy and security considerations.

There needs to be an establishment of a robust data release framework where sensitive data would be identified at an early stage. There also needs to be an honest position with regard to this where data collectors don’t combine data so that it can then be covered by the DPA or their might be

2. What could personal responsibility at Board-level do to ensure the right to data is being met include? Should the same person be responsible for ensuring that personal data is properly protected and that privacy issues are met?

Corporate responsibility a board level

3. Would we need to have a sanctions framework to enforce a right to data?

Yes, change cant happen without sanction

4. What other sectors would benefit from having a dedicated Sector Transparency Board

We think that the duplication of task is unnecessary when you have a common and clear sets of standards

1. How should public service make use of data inventories? What is the optimal way to develop and operate this?

Data inventories should that serve the purposes of both internal purposes and external purposes

2. How should data be prioritised for inclusion in an inventory? How is value to be established

Ideally there should be identification of a common set of ‘high value’ datasets that will help to embed the validity of open data these will also help to create a first wave of interpretations and applications. An implementation of a common data release plan would then be undertaken.

3. In what areas would you expect government to collect and publish data routinely?

All areas

4. What data is collected ‘unnecessarily’? How should these datasets be identified? Should collection be stopped?

There is a great deal of duplication of data within the public sector and this needs to be minimised. Careful consideration should be given as to what unnecessarily actually means. If it means that the data isn’t being used.

5. Should the data that government releases always be of high quality? How do we define quality? To what extent should public service providers ‘polish’ the data they publish, if at all?

You would expect that data that is collected on the public’s behalf for the delivery of public service should be of high quality and if it isn’t there is something that is wrong with the system. Although it might be necessary to anonymise or redact certain data this should only be undertaken in tightly defined cases.

1. How should government approach the release of existing data for policy and research purposes: should this be held in a central portal or held on departmental portals?

Ideally all the data should be held on the same portal so that there is no need to search for it

2. What factors should inform prioritisation of datasets for publication, at national, local or sector level?

High value quality of life datasets. Should always be identified. The quick wins

3. Which is more important: for government to prioritise publishing a broader set of data, or existing data at a more detailed level.

Data should be released at the source resolution. Additional work to create different resolutions of data should be discouraged

1. Is there a role for government to stimulate innovation in the use of Open Data? If so, what is the best way to achieve this?

Definitely. For Open Data to become sustainable we need to be able to not only encourage the supply side but that of the demand side as well. Where market failure occurs or where there is nascent development of a sector then there is a need to stimulate activity to drive awareness, create services and applications and develop a base layer from which further development niacin be derived.

Open Data in Manchester: Challenges and Opportunities

This blog post was originally written for the Open Knowledge Foundation blog.

Open Data Cities was initiated in May 2009, premised on the simple question of how cities would evolve if all data were made open. Would the same inequalities and asymmetries persist for example? Moreover, what would need to happen within the city to bring about the adoption of more open and transparent practices?

Greater Manchester is a region in the North West of England with a population of 2.8 million people. It comprises of 10 boroughs containing two cities and many large towns. Open Data Cities approached the city as a functioning organism comprising of these 10 boroughs. For the project to have a genuine impact with its inhabitants, we proposed that the project would need to align with how people used the city rather than the ways in which the city was administered. The reality within the city is that although people access services across authorities and whilst there are a number of pan-Greater Manchester public bodies, local authorities still deliver services to tight geographical boundaries.

Addressing the whole Greater Manchester region in this way, created an environment that allowed the project to evolve in a particular way. As the region was adopting City Region status this would require a certain alignment in terms of data and information. The granting of City Region status also opened up the possibility of bringing about an elected mayor, enabling, theoretically, a coherent region-wide strategy to be implemented.

Working across the ten boroughs – all with their own democratically elected councils is not without its challenges. Each public body has its own administrative and data structure and specific set of difficulties. It was therefore necessary to adopt a pragmatic, non-threatening approach as part of our project. Conversations therefore centered around the idea of allowing citizens to look ‘under the hood’ of public service so to speak, of creating better understanding of what councils do. Most importantly we were interested in rebalancing the relationship between public service and citizen and the possibility for services to be delivered with citizens rather than simply to citizens.

Communicating The Benefits

We were often challenged as to how the release of data would benefit the person on the street and who would create the applications and interpretations to allow this to happen. At the start of the Open Data Cities project the Open Data Manchester community was formed to provide evidence that there was indeed a ‘demand’ for the release of open data within the region. We argued that by giving people the tools to understand and act within communities, open data would have broader benefits too. Moreover, there was a growing acceptance that enabling people to access the data and information relevant to their locality was important. This in part has been born out by the emergence of hyperlocal blogging as a means of disseminating news and information at a community level.

Open Data Cities also strongly emphasised the innovation and economic benefits such open data could bring to the region. Opening up the ‘undiscovered country’ of open data, could kick start an economy based on the creation of data services. We had seen examples where companies such as Elbatrop software in London had created best selling applications for San Francisco based on released tree data. If Greater Manchester released data this could present an opportunity for developers to create applications that could have relevance beyond the Greater Manchester region. Research had identified that open data could add £6 billion of added value to the UK economy, how much of that value could be injected into the regional economy?

High value, ‘quality of life’ datasets were identified. Greater Manchester Passenger Transport Executive now TfGM, made the decision to release large and regularly updated datasets. This sparked a number of good applications but most of them were ‘proof of concept’ with little that could really be considered ready for market. This wasn’t the ‘release the data and people will build cool stuff’ future that we had been promoting, and even though the transport authority had now committed to making data open as a default position, they were very aware that not much was being built.

Acknowledging the Barriers

By talking to people who were involved in Open Data Manchester and the wider Greater Manchester digital community, it became apparent that although open data offered opportunities, there were a number of significant barriers that were inhibiting the development of services. These could be seen as return on investment, risk and liability.

The return on investment argument was quite apparent from early on. People have to make a living and generally want to see their efforts rewarded. By Open Data Cities embracing the 2.8 million people of Greater Manchester it was hoped that there would be enough people to sustain a market in Open Data application development. In order to kickstart this market it was proposed that a number of innovation challenges with sizeable incentives should take place.

It was obvious that there were no large digital businesses in the open data space and we had long held the view that their presence would be an indicator of the health of the open data innovation ecosystem. A suggested reason for the scarcity being that open data licensing was transferring all the risk on to the developer, whereas previously data would be generally released with some sort of service level agreement, none of these guarantees exist with open licensing. The idea of spending large amounts of development time on applications built on data that could then be turned off was deemed too risky.

Liability was also an issue. Who would be liable if someone had bought an application where the data was suddenly turned off or were inaccurate? There were also concerns as to the robustness of supplied data and the sometimes, archaic formats data were supplied in. The liability argument was also been put forward as a ‘supply side’ reason for non-disclosure both from a robustness of data and command and control perspective.

Collaboration

When FutureEverything and Trafford Council began working together on DataGM – The Greater Manchester Datastore, many of the local authorities were in a state of panic through having to negotiate the drastic shortfalls in budget. It was becoming apparent that innovation and citizen empowerment, although appealing were the least of concerns. Public bodies are still in a time of fiscal stress and it has been stated that few, if any, public bodies innovate out of a crisis.

All Greater Manchester local authorities and most pan-Greater Manchester public bodies are represented on the datastore steering group – The benefit of having a local authority leading the project, is their ability to get people around the table. Whilst some members of the group understood the logic of having a datastore and shared intelligence, there was a lot of resistance. Members stated despair at being involved with a project where they didn’t know if they would still be in post in three months time, with others not seeing the point of spending time and resource on something that didn’t have concrete output. There was also a very tangible silo mentality where the idea of shared intelligence across authorities was seen as attractive but not essential.

Evidence and Evolution

As the DataGM project gathered momentum more evidence started to emerge as to the inefficiencies of maintaining a siloed and closed data culture. The servicing of Freedom of information requests costs Greater Manchester public bodies over £4 million a year, over 600 public officials a day are unable to find or use data that they require in order to carry out their jobs – costing authorities over £8.5 million a year. The annoying tendency – for public bodies – of citizens using services outside their borough boundaries also creates difficulties. With no pan – Greater Manchester data initiative it is difficult for public bodies to create and deliver on coherent regional strategies. Open data offers a solution.

Now DataGM is becoming established the economic logic of using a centralised data catalogue, where the data that local authorities use themselves is openly available, is starting to make sense. Open data needs to be transformational. For public bodies enhanced engagement and the creation of innovative services are not enough. We are at a stage where we are saying if you spend A you will get savings of B and with open data you will also gain benefits of C, D, E…

DataGM is starting to develop data release schedules so that local authorities can release similar data in a coordinated way. With developers such as Swrrl – one of the recent winners of the EU Open Data Challenge, some of that data is being expressed as Linked Data. The Open Data Manchester community continues to grow. Although there is still a long way to go with open data in Manchester it feels like more people within public service are starting to see the benefits, and the possibility of Greater Manchester becoming an Open Data City gets closer.

Licensing – Why it is so important

This blog post originally was originally written for FutureEverything as part of their Open Data Cities programme.

I’m no expert but I really need to be – Licensing

Licensing is a subject that comes up a lot with Open Data. The licence is a key component of the dataset. It defines the use and liability and it shapes how or what innovation will come from data release.

As mentioned in the title I am no expert in this area and I would appreciate any correction or amendments to my understanding.

Traditionally public data has been closed so that the only way you could get access to data to build products was by buying a licence to use. In many cases these licences were expensive and restrictive. The to mitigate this cost often, te licence would also have some level of service agreement built in. You paid for the licence for the data and the data provider would provide you with a level of continuity and support. This helps to limit risk and encourage investment into a product.

The closed ‘paid licence’ system generally has a high barrier to entry ‘price of licence’ limiting the amount of innovative products developed. If innovation ecosystems are ideas that live with most failing. The price of failure being too high could have a chilling effect on the whole system.

One of the first licenses used for the release of Open Data was Creative Commons CC-BY-SA. This licence allowed people to create services and products off the back of the data as long as they attribute where the data came from and share back any data that was created off the back off the originally released dataset (value-added data). The original Creative Commons licenses were devised as an answer to restrictive copyright laws relating to ‘works’ – articles, text, images, music etc., as these were deemed increasingly anachronistic in the digital age. It is up for discussion if data can be deemed as a ‘work’ in the context of this licence.

The Open Database Licence (ODbL) developed by Open Data Commons, was created to address the doubt that data could be seen as a ‘work’. It carries the same attribution and share alike clauses and is used by many datastores including the newly opened Paris Datastore.

Anyone can develop products and services that use datasets with these licences but intellectual property doesn’t extend to the value-added datasets created in the process of developing these products. Releasing value-added datasets back to the community allows further innovative products to be released off the back of these datasets, so potentially the pace of innovation could be increased – It is analogous to the ‘standing on the shoulders of giants’ idea.

By imposing further use of value-added data by other organisations might chill the development of products that create value-added data.

With the above licences there is generally no liability or guarantee of service from data providers. This creates a greater risk scenario. If you were investing in product development this potentially is a source of concern and may be an inhibiting factor

In the UK we have the recently released Open Government Data Licence. That was developed specifically for government data. It borrows from some aspects of the CC-BY-SA licence and ODbL. Unlike the those licences there is no need to share back value-added data.

Would this have any impact on products and services that are developed from Open Data? Again in the licence there is no liability or guarantee of service from the data provider but the developing organisation gets to keep all the rights on the products and services they develop – including value-added datasets.
The advantage of this could be that by allowing people to keep hold of the rights to the products that they develop might be mitigate against the exposed risk posed by the lack of liability and guarantee. The main disadvantage could be that the pace of innovation could be curtailed due to people having to replicate process and value-added datasets.

Why Open Data?

Back in May 2009 after the final presentations at Futuresonic 09. I sat down with Adam Greenfield and we talked about how cities evolved and grew, and how they developed inequalities through those that have access to information and those who don’t. This coupled with an individual’s ability to act on that information in a meaningful way begged the question, that if all information/data was open and available, how would a city evolve? Would it grow with the same asymmetries, as Adam suggested in his Futuresonic presentation, is this inequality a preconfigured state?

At the time there were few cities who had embarked down the route of fully opening up their datasets although some cities in North America had started a process that would eventually, as in the case of Vancouver, lead to an adoption of open source, open standards and open data principles.

It was through seeing this emergence of open systems that the Open Data City project began to evolve. Data is is the lifeblood of our modern technologised society. It tracks, evidences and creates mechanisms for decisions. Much of this data doesn’t exist outside the confines of City Hall but we see evidence of the impact of this data everyday. Speed humps suddenly appear on your road or your bus doesn’t turn up when you thought it would. Bins only get emptied every two weeks or your local school closes down. This is the physical manifestation of the publicly held data that few have access to.

The inability to connect action taken by a public body with the evidence on which the decisions are made can have an insidious and corrosive effect on the relationship between the citizenry and government. Just as Louis Brandeis said ‘Sunlight is the best disinfectant’ with regards to transparency and corruption, the opposite is also true. In a closed system even though the decisions might be taken with the most honourable of intentions, the lack of evidence for the decision creates doubt, rumour and misrepresentation. In a closed system the power of the media increases as the distrust of the political sphere decreases. The media becomes the interlocutor and which can interfere with the relationship between citizen and government. This all presumes that those that govern have nothing to hide. The lack of transparency in government creates the opportunity for the media to expose the bad apples using a system of clandestine briefings and investigative reporting. This process of exposé undermines the trust the public has in the system of government because there is no evidence to the contrary or that the evidence that people can see has been derived from a seemingly arbitrary decision making process.

The opportunity has arisen for public bodies to create a new relationship with the people who they serve. A more transparent and open system can lead to a more equitable environment, where the citizen is not a customer or passive consumer of service and information, but an engaged citizen who is able to make decisions based upon facts, not rumour and can hold to account public servants with less than honest intentions.

The Sunlight Foundation www.sunlightfoundation.com, named after the Louis Brandeis quote, are an American lobby group advocating transparency in government. They have produced this graphic which they call the Cycle of Transparency which aptly illustrates the benefits of transparency in government. As each element of the Cycle of Transparency moves forward concurrently, bringing about the changes needed to create a more transparent government whilst identifying new needs.

The Cycle Of Transparency highlights the use of technology to make information open and accessible. It can be argued that transparency and openness has been enabled by digital technology. People are now able to access, interpret and distribute information easily. Until quite recently, the channels for making information open and accessible where limited and to a certain extent controlled.

The landscape is changing. The opening up of data will have a seismic effect on the way we access and share information. New services will be created, as citizens and institutions demand the ability to interpret and navigate through data in the way they want. It will create a more efficient data environment where information is shared rather than duplicated, and it will highlight errors in the system with anomalies being addressed rather than hidden.