Talk:Data sharing

From Wikipedia, the free encyclopedia

An entry from Data sharing appeared on Wikipedia's Main Page in the Did you know? column on 25 April 2007.
Wikipedia

Contents

[edit] Goal of the article

The goal is to provide information to students and others interested in scientific research about the policies required by funding agencies and peer-reviewed journals. This article seeks to answer questions such as:

  • What is data sharing?
  • How is data sharing related to data archiving?
  • What data has to be shared?
  • What agencies and organizations require data to be shared?
  • What happens when authors refuse to share data?

Feel free to improve the article. RonCram 02:32, 21 April 2007 (UTC)

[edit] Relation of the requirement of data sharing to the "Open Access" movement

The article could be improved by a discussing the relationship between data sharing and the Open Access movement. It is my understanding that the Open Access movement shares the same values that underlie the requirement to share data, but extends the idea to publishing research articles so they are free of charge. While data sharing is not controversial by itself, Open Access could be considered so. For those wishing to improve the article, you might look into this aspect. [1] RonCram 20:13, 25 April 2007 (UTC)

[edit] International policies on data sharing

I will add more information on international policies. Please be patient. RonCram 14:42, 7 September 2007 (UTC)

[edit] Article is too universal in claims and needs to be qualified

In the absence of contractural, legal or institutional requirements governing the availability of raw data, investigators have the sole right to make decisions regarding the sharing of raw data and other proprietary research products (computer programs, designs, photographs, etc.)

It is an exaggeration to conclude that researchers who limit distribution of raw data are violating fundamental principles of science. Valid reasons to restrict access to raw data and documentation include:

1. Protecting the privacy of sources. 2. Maintaining non-disclosure agreements. 3. Maintaining individual patient/victim confidentiality. 4. Maintaining competitive advantage on grant proposals, future discoveries, and new product development. 5. Protecting investment acquiring the raw data. 6. Avoiding costs incurred in sharing the data. 7. Avoiding misuse/misinterpretation leading to legal liabilities.

In the absence of contractural, legal or institutional requirements, access to the raw data, computer programs, documentation, or other proprietary products of scientific research is at the discretion of the investigators. Scientists are often willing to share with other scientists who have developed a trust relationship and offer possibility for mutual benefit from the sharing relationship. Benefits that commonly induce researchers to share proprietary data or other research products include:

1. Hope of discoveries leading to jointly authored publications unlikely to occur otherwise. 2. Possibility of securing additional funding. 3. Acknowledgements in peer-reviewed publications. 4. Greater awareness of the scientist’s contributions. 5. Maximal use made of live animal studies. 6. Overt quid pro quo exchanges of one research product for another (or for money).

Trust, goodwill, and mutual admiration are the bedrock of this kind of scientific collaboration.

The scientific method requires a description of experimental methods sufficient to reproduce the results. It does not require complete sharing of data to any third party that asks for it. Michael Courtney (talk) 13:07, 27 March 2008 (UTC)

Michael, you bring up some good points that need to be discussed. Perhaps we need to establish definitions of key terms. The article was written with the academic community mainly in view. You are certainly correct that businesses in technology, medicine and other industries conduct a great deal of research that is scientific. In many cases, they are seeking a competitive advantage and do not want to share their discoveries. The purpose of the patent system is to offer commercial protection for a limited number of years in return for sharing the discoveries with the world. If you are trying to make discoveries leading to new products, then non-disclosure agreements, trade secrets and other mechanisms come into play to protect the knowledge base until the patent is issued. Once a patent is issued, the secrets covered by the patent are exposed for all to see. However, if someone wishes to publish in a scientific journal, they cannot publish claims without also publishing all of their data, methods and source code so that others can verify their claims. If their claims are not reproducible, then they are not valid. If you are doing science for the purpose of publishing, then it is a standard of science that one must share data. If you do not want to share data, then you cannot publish. RonCram (talk) 22:19, 22 April 2008 (UTC)

Making the distinction between academic and industrial science is painting with an overly-broad brush. Data sharing is not ubiquitous even within the academic communities. A few publication venues have begun to require data sharing. Many still do not. Some funding agencies require data sharing. Many do not.

I have published numerous papers in various scholarly journals involving research funded by the NSF, ONR, and various other sources. Not once has my scholarly work been subject to an institutional, journal, or funding agency requirement for publication of raw data. I have voluntarily shared some data and source code with some third parties who have requested it, but I have denied access to raw data to others (particularly photographs related to live animal research that could end up at an animal rights web-site.)

To my knowledge, no one requires the publication of sensitive records related to live animal research. Would you expect to be able to go to any drug company and acquire all of the data (including photographs and identities of all lab personnel) related to live animal testing of a given drug? Likewise, raw data related to many human studies is protected by confidentiality agreements, and the data is only published in aggregate form.

For example, the mere fact of being a researcher in the field of blast wave injury does not mean that I can go to anyone who has published work in the field and demand access to their raw data on human or live animal studies. A second example, I doubt you'd have success gaining access to unedited autopsy reports on which many papers in the forensic science literature are based.

In the absence of contractural, legal or institutional requirements governing the availability of raw data, investigators have the sole right to make decisions regarding the sharing of raw data and other proprietary research products (computer programs, designs, photographs, etc.)

Scientific repeatability means that published experiments are described in sufficient detail to be repeated by independent parties. It does not demand sharing of data and other proprietary research products. Data sharing is simply not a ubiquitous feature of the scientific method. One can make a case for a trend toward data sharing, but not for the ubiquity of data sharing in scholarly work.

In industrial science, patent applications disclose key scientific and technological details, but there is not a general sharing of raw data or source code. Some details almost always remain proprietary to make patent infringement more difficult.Michael Courtney (talk) 12:37, 28 April 2008 (UTC)

Michael, I appreciate your experience and I do not think we disagree that much. Of course, I would not expect to get photographs and identities of personnel or test subjects. Medical research has privacy issues. However, a great deal of data from which the aggregate is gathered should be made available. Openness is essential to the scientific method. You may want to read Pseudoscience. Published experiments are rarely described in sufficient detail to be repeatable or falsifiable. Access to raw data is necessary to for statistical analysis to determine if the claims/conclusions are valid or not. In fact, meta-data is also necessary. What is the provenance of the data? What adjustments were made to the data? etc. I understand industrial science well and the strategies used to protect a product. This is where your comments provide the most value. I will attempt to make a few changes. RonCram (talk) 01:39, 19 May 2008 (UTC)
You don't seem to have answered any of the questions. In fact you seem to have deliberately evaded them William M. Connolley (talk) 18:21, 19 May 2008 (UTC)

Wikipedia is not the place for editorializing what should be in science. The article is only appropriate for describing the role of data sharing as it is currently required to the degree that such requirements can be supported by Wikipedia standards for reliable sources. The statement that "data sharing is a requirement in the science community" is overly broad. It would be more accurate to say that there is a recent trend toward data sharing by a growing number of research institutions, funding agencies, and journals. However, the implication that sources that refuse to share data (in the absence of legal, institutional or funding requirements) are somehow doing something unethical or unscientific is unwarranted.Michael Courtney (talk) 00:12, 26 May 2008 (UTC)

Try Google Scholar searches on "private communication" and "unpublished data". Such references seem much to common to make any kind of case that data sharing is ubiquitous or a general requirement in the scientific community.Michael Courtney (talk) 00:27, 26 May 2008 (UTC)