Open data

Open data map
Linked open data cloud in August 2014
Clear labeling of the licensing terms is a key component of open data, and icons like the one pictured here are being used for that purpose.

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.[1] The goals of the open data movement are similar to those of other "open" movements such as open source, open hardware, open content, open government and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov and Data.gov.uk.

Overview

The concept of open data is not new; but a formalized definition is relatively new. One definition is the Open Definition which can be summarized in the statement that "A piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike."[2] Other definitions, including the Open Data Institute's "Open data is data that anyone can access, use or share", have an accessible short version of the definition but refer to the formal definition.

Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the common good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by a license.

A typical depiction of the need for open data:

Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery ... we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge.
John Wilbanks, VP Science, Creative Commons[3]

Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright puts the data into the public domain. For example, many scientists do not regard the published data arising from their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. However, the lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is also possible for public or private organizations to aggregate said data, protect it with copyright and then resell it.

The issue of indigenous knowledge (IK) poses a great challenge in terms of capturing, storage and distribution. Many societies in third-world countries lack the technicality processes of managing the IK.

At his presentation at the XML 2005 conference, Connolly[4] displayed these two quotations regarding open data:

Major sources

Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.

In science

The concept of open access to scientific data was institutionally established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957–1958.[6] The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.[7]

While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.

The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information (…) should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society'.[8] More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can also be used productively within the context of industrial R&D.[9]

In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which essentially states that all publicly funded archive data should be made publicly available.[10] Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.[11]

Examples of open data in science:

In Government

There are a range of different arguments for Government Open Data.[13][14] For example, some advocates contend that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services.

Several national governments have created web sites to distribute a portion of the data they collect. It is a concept for a collaborative project in municipal Government to create and organize culture for Open Data or Open government data.

Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. Data.gov lists the sites of a total of 40 US states and 46 US cities and counties with web sites to provide open data; e.g. the state of Maryland, the state of California, US.[15]

At the international level, the United Nations has an open data website that publishes statistical data from Member States and UN Agencies,[16] and The World Bank published a range of statistical data relating to developing countries.[17] The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies[18] and the PublicData portal that provides datasets from local, regional and national public bodies across Europe.[19]

In October 2015, the Open Government Partnership launched the International Open Data Charter, a set of principles and best practices for the release of governmental open data formally adopted by seventeen governments of countries, states and cities during the OGP Global Summit in Mexico.[20]

Arguments for and against

The debate on Open Data is still evolving. The best open government applications seek to empower citizens, to help small businesses, or to create value in some other positive, constructive way. Opening government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.

Arguments made on behalf of Open Data include the following:

It is generally held that factual data cannot be copyrighted.[26] However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.

While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.

Unlike Open Access, where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.

Arguments against making all data available as Open Data include the following:

Relation to other open activities

The goals of the Open Data movement are similar to those of other "Open" movements.

Funders' mandates

Several funding bodies which mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR):[30]

Other bodies active in promoting the deposition of data as well as fulltext include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU, due to launch in 2014) should mandate that funded projects hand in their databases as "deliverables" at the end of the project, so that they can be checked for third party usability then shared.[31]

Non-Open data

Several mechanisms restrict access to or reuse of data (and several reasons for doing this are given above). They include:

Organisations promoting open data

See also

References

  1. Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. (2007). "DBpedia: A Nucleus for a Web of Open Data". The Semantic Web. Lecture Notes in Computer Science. 4825. p. 722. ISBN 978-3-540-76297-3. doi:10.1007/978-3-540-76298-0_52.
  2. See Open Definition home page and the full Open Definition
  3. Science Commons
  4. Connolly, Dan (16 November 2005). "Semantic Web Data Integration with hCalendar and GRDDL". W3C Talks and Presentations. XML Conference & Exposition 2005, Atlanta, Georgia, USA: W3C. p. 2. Retrieved 2 May 2015.
  5. Veen, Jeffrey (2 November 2005). "Polar Heart Rate Monitors: Gimme my data!". A website by Jeffrey Veen.
  6. Committee on Scientific Accomplishments of Earth Observations from Space, National Research Council (2008). Earth Observations from Space: The First 50 Years of Scientific Achievements. The National Academies Press. p. 6. ISBN 0-309-11095-5. Retrieved 2010-11-24.
  7. World Data Center System (18 September 2009). "About the World Data Center System". NOAA, National Geophysical Data Center. Retrieved 2010-11-24.
  8. Human Genome Project, 1996. Summary of Principles Agreed Upon at the First International Strategy Meeting on Human Genome Sequencing (Bermuda, 25–28 February 1996)
  9. Perkmann, Markus and Schildt, Henri, Open Data Partnerships between Firms and Universities: The Role of Boundary Organizations, Research Policy 44 (2015) 1133–1143
  10. OECD Declaration on Open Access to publicly funded data Archived 20 April 2010 at the Wayback Machine.
  11. OECD Principles and Guidelines for Access to Research Data from Public Funding
  12. Dataverse Network Project
  13. Gray, Jonathan. "Towards a Genealogy of Open Data". SSRN Electronic Journal. Social Science Research Network (SSRN). doi:10.2139/ssrn.2605828.
  14. Brito, Jerry. "Hack, Mash, & Peer: Crowdsourcing Government Transparency". Colum. Sci. & Tech. L. Rev. 119 (2008).
  15. data.ca.gov
  16. EU Open Data Portal
  17. "The Open Data Charter: A Roadmap for Using a Global Resource". The Huffington Post. Retrieved 2015-10-29.
  18. On the road to open data, by Ian Manocha
  19. "Big Data for Development: From Information- to Knowledge Societies", Martin Hilbert (2013), SSRN Scholarly Paper No. ID 2205145. Rochester, NY: Social Science Research Network; http://papers.ssrn.com/abstract=2205145
  20. How to Make the Dream Come True argues in one research area (Astronomy) that access to open data increases the rate of scientific discovery.
  21. Khodiyar, Varsha. "Stopping the rot: ensuring continued access to scientific data, irrespective of age". F1000 Research. F1000. Retrieved 2015-03-11.
  22. Magee, Andrew F.; May, Michael R.; Moore, Brian R.; Murphy, William J. (24 October 2014). "The Dawn of Open Access to Phylogenetic Data". PLoS ONE. 9 (10): e110268. PMC 4208793Freely accessible. PMID 25343725. doi:10.1371/journal.pone.0110268.
  23. Towards a Science Commons includes an overview of the basis of Openness in science data.
  24. Protocol for Implementing Open Access Data
  25. http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html creation of term
  26. Kauppinen, T.; Espindola, G. M. D. (2011). "Linked Open Science-Communicating, Sharing and Evaluating Data, Methods and Results for Executable Papers". Procedia Computer Science. 4: 726. doi:10.1016/j.procs.2011.04.076.
  27. SPARC-OpenData@arl.org Mailing List Archive
  28. Galsworthy, M.J. & McKee, M. (2013). Europe's "Horizon 2020" science funding programme: How is it shaping up? Journal of Health Services Research and Policy. doi: 10.1177/1355819613476017
  29. Review of history and positions by the University of California
  30. "Free our data" (The Guardian technology section)
  31. GODAN background
  32. http://linkedscience.org/about
  33. Open Cultuur Data, Rijksmuseum, 11 September 2012
  34. Linking Open Data on the Semantic Web
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.