Advertisement
Commentary| Volume 70, P254-255, February 2016

Data and statistical commands should be routinely disclosed in order to promote greater transparency and accountability in clinical and behavioral research

      This commentary argues for clinical, public health, and health science journals routinely to invite authors to make data and statistical analysis command files underlying the findings reported in articles available as supplementary files and signal prominently articles for which this is done using a “transparency” quality marker.
      The background to this is a need to make the conduct of science more efficient and effective. The Lancet recently published a series of articles on waste in science which drew attention to the many ways in which public money is wasted unnecessarily because of the way we undertake science [
      • Glasziou Paul
      • Altman Douglas G.
      • Bossuyt Patrick
      • Boutron Isabelle
      • Clarke Mike
      • Julious Steven
      • et al.
      Reducing waste from incomplete or unusable reports of biomedical research.
      ]. There is waste throughout the process from awarding of research funds, conduct of research, writing up of findings, publishing findings, and even their dissemination and the use to which they are/are not put.
      One part of the process that requires special attention is transparency with regard to the data and its statistical analysis. Calls for greater transparency have a long history [
      ], and there are now guidelines concerning data sharing and availability [

      Inter-university Consortium for Political and Social Research (ICPSR). Research transparency, data access, and data citation: a call to action for scholarly publication. Available at http://datacommunity.icpsr.umich.edu/research-transparency-data-access-and-data-citation-call-action-scholarly-publications. Accessed July 23, 2015.

      ]. Apart from providing some protection against fraud and misrepresentation of findings, there are at least two important ways in which transparency could improve our science. One is in reducing the error rate, both in the data itself and in its analysis. The other is in facilitating additional analyses that can help in the interpretation of reported findings or establishing new findings that were not included in the original published report.
      Considering first the issue of data errors, mistakes in recording and handling of data could account for part of the high rate of failure to replicate findings in clinical and health research. There are numerous potential entry points for errors in the data including misrecording and mistranscribing, and incorrect commands for recoding variables and computing new variables. It is all too easy for mistakes to creep in, and in most cases, these will not be checked by an independent source. If the data and the commands are available, it will be possible for someone after publication to check them. It will also provide a greater incentive to ensure that they are checked before publication.
      When it comes to the opportunities provided for additional analyses, it is common when reading an article to want to know more about the analyses and what results would have been obtained had somewhat different analyses or coding been undertaken. It is often not realistic for all the plausible different ways of analyzing data to answer a research question to be presented in an article, but if the data and commands were available, it would be open to others to undertake the analyses and either reassure themselves that the findings were robust or identify weaknesses. For example, how one groups a quantitative variable can make a substantial difference to one's findings, as can the way in which one combines variables into a composite score. It can also make a difference whether one uses one underlying statistical model or another. Besides establishing the robustness or otherwise of findings, making the data available offers the opportunity for readers of an article to answer questions that might otherwise never be addressed. If additional analyses lead to new insights and substantially change the interpretation of findings, this could be communicated to the original authors in the first instance and then, if appropriate, form the basis for public correspondence. It is not expected that anyone other than the owner(s) of the data set would have rights to publish findings beyond this limited quality control process. If a substantive new finding was to emerge from further analysis, publication would normally have to be agreed with the owners of the data. An exception would be data sets that were explicitly designated as being in the public domain.
      Possible negative consequences need to be considered. One is that issues of intellectual property will need to be clarified. It is important to note that what is being proposed is data “disclosure,” not data sharing. Thus, ownership and intellectual property would clearly remain with the primary authors. However, if another researcher identifies a flaw in the analysis, it will be necessary to ensure that he or she has the right to publish this, having first alerted the primary authors. It will also be necessary to inculcate routine annotation of data sets so that they can be used by others. Again, this should not be a major issue as it is already established as good practice. A further issue is the opportunity for vested interests such as the tobacco industry to make use of the data sets. This is a serious problem and could be grounds for more limited availability so that researchers wanting to look at the data have to state their credentials. There may occasionally be issues of privacy for patients or participants. If the data set could not be anonymized, this would probably override the disclosure principle.
      Bearing in mind the possible drawbacks, as a first step, it may be the best to encourage rather than require authors to make data sets and command files available as supplementary files. Those who do could have their article “kite-marked” as meeting that particular quality standard. Depending on how this goes, one could then move to making data disclosure compulsory.

      References

        • Glasziou Paul
        • Altman Douglas G.
        • Bossuyt Patrick
        • Boutron Isabelle
        • Clarke Mike
        • Julious Steven
        • et al.
        Reducing waste from incomplete or unusable reports of biomedical research.
        Lancet. 2014; 383: 267-276
      1. Fienberg S.E. Martin M.E. Straf M.L. Sharing research data. National Academy Press, Washington, D.C1985
      2. Inter-university Consortium for Political and Social Research (ICPSR). Research transparency, data access, and data citation: a call to action for scholarly publication. Available at http://datacommunity.icpsr.umich.edu/research-transparency-data-access-and-data-citation-call-action-scholarly-publications. Accessed July 23, 2015.

      Linked Article

      • Disclosure of data and statistical commands should accompany completely reported studies
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          Enabling those wishing to confirm the findings of potentially practice-changing studies, by providing access to the raw data and statistical code used to perform analyses, would be beneficial to all. We agree with West that if authors were to make such information available as ancillary content with their journal articles, it could facilitate intentional replication or confirmation of study findings. It could also potentially fuel new explorations of existing data, invariably reducing the all-too-overlooked issues of research waste due to inaccessibility [1] and unusable reports [2].
        • Full-Text
        • PDF
      • Research data as a global public good
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          Making research data and statistical analysis commands accessible to other researchers has important advantages of transparency, accountability, replicability, and efficiency, as was made clear in the commentary by West [1]. In addition, broad availability of research data can help to mobilize the world's intellectual community to better harvest the scientific potential of the exponentially increasing number and size of data and databases. Indeed, many researchers and institutions produce and store substantially more data than they can effectively explore and analyze in all possibly relevant and promising respects.
        • Full-Text
        • PDF
      • Open data are not enough to realize full transparency
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          The plea by Robert West to invite authors of clinical and behavioral studies to publish their data sets and command files is clearly important in context of the prevention of research waste [1,2]. I fully agree to his proposal, but I also firmly believe we need to go substantially further. West focuses on voluntary transparency regarding the data and the analyses underlying the article at issue. He provides three reasons why this is important: to protect against fraud and misrepresentation, to reduce the error rate, and to facilitate additional analysis.
        • Full-Text
        • PDF
      • Navigating an open road
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          West [1] outlines an agenda for promoting greater transparency and accountability through the routine disclosure of data and associated materials such as statistical commands. The Tobacco and Alcohol Research Group ( http://www.bris.ac.uk/expsych/research/brain/targ/ ), part of the UK Center for Tobacco and Alcohol Studies and the MRC Integrative Epidemiology Unit at the University of Bristol, has been moving toward an (admittedly incomplete) Open Science model over the last few years, focused on three core areas: materials (specifically study protocols), data, and publications.
        • Full-Text
        • PDF
      • The end of scientific articles as we know them?
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          I am grateful to the commentators for their thoughtful and perceptive remarks. On reading these comments and having attended a number of meetings where these issues have been discussed, I can only agree with most of the commentators that a voluntary code for making data and command syntax available does not go far enough. The way we conduct our science needs substantial restructuring. Lack of ability of the wider scientific community to scrutinize data sets and statistical analysis commands is one issue and probably a substantial one.
        • Full-Text
        • PDF
      • How do we make it easy and rewarding for researchers to share their data? A publisher's perspective
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          For the sake of “variance and dissent,” it may have been fitting to purposely take a more adversary point of view—but playing devil's advocate here would do injustice to the importance of the issues of data transparency and data sharing, and to the thoughtful article written by Robert West [1]. In fact, I found the article very insightful and balanced in that it clearly outlines the needs and promises for more transparency and better practices around data sharing, but it also rightly points out some of the challenges and risks that need to be addressed in moving to such a situation.
        • Full-Text
        • PDF
      • Fixing flaws in science must be professionalized
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          It is heartening to see calls for greater transparency around data and analytic strategies, including in this issue, from such senior academic figures as Robert West. Science currently faces multiple challenges to its credibility. There is an ongoing lack of public trust in science and medicine, often built on weak conspiracy theories about technology such as vaccines [1]. At the same, however, there is clear evidence that we have failed to competently implement the scientific principles we espouse.
        • Full-Text
        • PDF
      • Anticipating consequences of sharing raw data and code and of awarding badges for sharing
        Journal of Clinical EpidemiologyVol. 70
        • Preview
          West argues [1] that on an elective basis, journals could encourage authors of scientific publications to make data and statistical analysis command files available and get a “transparency” quality marker in reward. The concept is interesting, and the suggestion of data and code sharing is not new. Some journals have long made this a prerequisite for publication for some study types, for example, microarrays [2]. Across the scientific literature, the proportion of journals that adopt sharing data and code policies is increasing [3,4], although these editorial policies are not strongly enforced [3].
        • Full-Text
        • PDF