1. Introduction
Research in cancer biomarkers has expanded in recent years leading to growing and large literature. However, despite major investments and advances in technology, the current biomarker pipeline is found to be too prone to failures [
1- Ioannidis J.P.A.
- Bossuyt P.M.M.
Waste, leaks, and failures in the biomarker pipeline.
,
]. Similarly, much research has been dedicated to the discovery of ovarian cancer biomarkers. However, despite many biomarkers being evaluated, very few have been successfully introduced in clinical care [
[3]The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?.
]. Likely reasons for failure have been documented at each of the stages of biomarker evaluation [
1- Ioannidis J.P.A.
- Bossuyt P.M.M.
Waste, leaks, and failures in the biomarker pipeline.
,
,
3The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?.
].
It has been argued that biomarker discovery studies sometimes suffer from weak study designs, limited sample size, and incomplete or biased reporting, which can render them vulnerable to exaggerated interpretation of biomarker performance [
1- Ioannidis J.P.A.
- Bossuyt P.M.M.
Waste, leaks, and failures in the biomarker pipeline.
,
4Improving biomarker identification with better designs and reporting.
]. Authors may claim favorable performance and clinical effectiveness of biomarkers based on selective reporting of significant findings, or present study results with an overly positive conclusion in the abstract compared with the main text [
[5]- Boutron I.
- Dutton S.
- Ravaud P.
- Altman D.G.
Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
]. Specific study features could facilitate distorted study results, such as not prespecifying a biomarker threshold or lacking a specific study objective.
Spin, or misrepresentation and misinterpretation of study findings, not necessarily intentional, is any reporting practice that makes the study findings appear more favorable than the results justify [
6- Ochodo E.A.
- de Haan M.C.
- Reitsma J.B.
- Hooft L.
- Bossuyt P.M.
- Leeflang M.M.
Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
,
7Misrepresentation and distortion of research in biomedical literature.
]. Several studies have shown that authors of clinical studies may commonly present and interpret their research findings with a form of spin [
5- Boutron I.
- Dutton S.
- Ravaud P.
- Altman D.G.
Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
,
7Misrepresentation and distortion of research in biomedical literature.
,
8- Yavchitz A.
- Ravaud P.
- Altman D.G.
- Moher D.
- Hrobjartsson A.
- Lasserson T.
- et al.
A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
,
9- Lazarus C.
- Haneef R.
- Ravaud P.
- Hopewell S.
- Altman D.G.
- Boutron I.
Peer reviewers identified spin in manuscripts of nonrandomized studies assessing therapeutic interventions, but their impact on spin in abstract conclusions was limited.
,
10- Boutron I.
- Altman D.G.
- Hopewell S.
- Vera-Badillo F.
- Tannock I.
- Ravaud P.
Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
]. A consequence of biased representation of results in scientific reports is that the published literature may suggest stronger evidence than is justified [
[11]- Munafò M.R.
- Nosek B.A.
- Bishop D.V.M.
- Button K.S.
- Chambers C.D.
- Percie du Sert N.
- et al.
A manifesto for reproducible science.
]. Misrepresentation of study findings may also lead to serious implications for patients, health care providers, and policy makers [
[12]- Macleod M.R.
- Michie S.
- Roberts I.
- Dirnagl U.
- Chalmers I.
- Ioannidis J.P.A.
- et al.
Biomedical research: increasing value, reducing waste.
].
The primary aim of our study was to evaluate the presence of spin, further categorized as misrepresentation and overinterpretation of study findings, in recent clinical studies evaluating the performance of biomarkers in ovarian cancer. In addition, we also evaluated facilitators of spin (i.e., practices that would facilitate overinterpretation of results), as well as a number of potential determinants of spin.
2. Methods
We performed a systematic review to document the prevalence of spin in recent evaluations of the clinical performance of biomarkers in ovarian cancer.
2.1 Literature search
MEDLINE was searched through PubMed on December 22, 2016, for all studies evaluating the performance of biomarkers in ovarian cancer published in 2015. The search terms and strategy were developed in collaboration with a medical information specialist (R.S.), using a combination of terms that express the clinical performance of biomarkers in ovarian cancer (
Appendix A). We included all markers of ovarian cancer risk, screening, prognosis, or treatment response in body fluid, tissue, or imaging measurements. Reviews, animal studies, and cell line studies were excluded.
Two authors (M.G. and M.O.) independently reviewed the titles and abstracts to identify potentially eligible articles. Thereafter, full-texts of reports identified as potentially eligible were independently reviewed by the same two authors for inclusion. All disagreements were resolved through discussion or by third party arbitration (P.M.B.). We analyzed the first 200 consecutive studies, ranked according to publication date, to have a sample size comparable with previous systematic reviews of spin [
6- Ochodo E.A.
- de Haan M.C.
- Reitsma J.B.
- Hooft L.
- Bossuyt P.M.
- Leeflang M.M.
Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
,
13- McGrath T.A.
- McInnes M.D.F.
- van Es N.
- Leeflang M.M.G.
- Korevaar D.A.
- Bossuyt P.M.M.
Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
].
2.2 Establishing criteria and data extraction
Biomarker studies in ovarian cancer vary by study design, biomarker clinical application, type and number of tests evaluated [
14- Horvath A.R.
- Lord S.J.
- StJohn A.
- Sandberg S.
- Cobbaert C.M.
- Lorenz S.
- et al.
From biomarkers to medical tests: the changing landscape of test evaluation.
,
15- Pavlou M.P.
- Diamandis E.P.
- Blasutig I.M.
The long journey of cancer biomarkers from the bench to the clinic.
]. Within the evaluation process, several components can be assessed, such as analytical performance, clinical performance, clinical effectiveness, cost-effectiveness, and all other consequences beyond clinical effectiveness and cost-effectiveness. We developed a definition of spin that encompassed common features applicable to all the various biomarker types and study designs. We defined spin as reporting practices that make the clinical performance of markers look more favorable than results justify. This definition of spin was based on criteria extracted from key articles on misrepresentation and misinterpretation of study findings [
5- Boutron I.
- Dutton S.
- Ravaud P.
- Altman D.G.
Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
,
6- Ochodo E.A.
- de Haan M.C.
- Reitsma J.B.
- Hooft L.
- Bossuyt P.M.
- Leeflang M.M.
Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
,
7Misrepresentation and distortion of research in biomedical literature.
,
9- Lazarus C.
- Haneef R.
- Ravaud P.
- Hopewell S.
- Altman D.G.
- Boutron I.
Peer reviewers identified spin in manuscripts of nonrandomized studies assessing therapeutic interventions, but their impact on spin in abstract conclusions was limited.
,
10- Boutron I.
- Altman D.G.
- Hopewell S.
- Vera-Badillo F.
- Tannock I.
- Ravaud P.
Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
,
13- McGrath T.A.
- McInnes M.D.F.
- van Es N.
- Leeflang M.M.G.
- Korevaar D.A.
- Bossuyt P.M.M.
Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
,
16- Chiu K.
- Grundy Q.
- Bero L.
‘Spin’ in published biomedical literature: a methodological systematic review.
,
17- Lazarus C.
- Haneef R.
- Ravaud P.
- Boutron I.
Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
].
To evaluate the frequency of spin, we established a preliminary list incorporating previously established items that represent spin as well [
5- Boutron I.
- Dutton S.
- Ravaud P.
- Altman D.G.
Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
,
6- Ochodo E.A.
- de Haan M.C.
- Reitsma J.B.
- Hooft L.
- Bossuyt P.M.
- Leeflang M.M.
Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
,
7Misrepresentation and distortion of research in biomedical literature.
,
16- Chiu K.
- Grundy Q.
- Bero L.
‘Spin’ in published biomedical literature: a methodological systematic review.
,
17- Lazarus C.
- Haneef R.
- Ravaud P.
- Boutron I.
Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
]. We then established a preliminary list of criteria to evaluate the frequency of spin and optimized our criteria through a gradual data extraction process. A set of 20 articles were fully verified by a second reviewer (M.O.), and points of disagreements were discussed with a third investigator (I.B. and P.M.B.) to fine-tune the scoring criteria and clarify the coding scheme. Through this process and discussions that ensued, a final list of items was established with content experts (P.M.B. and I.B.), categorizing items as representing “spin” or “facilitator of spin.” Each of the categories encompassed several forms of spin.
We further classified spin into two categories: “misrepresentation” and “misinterpretation”, to distinguish between distorted presentation and incorrect interpretation of findings with special focus on the abstract and main-text conclusions. As the presence of a positive conclusion is interdependent with the items that represent spin, we assessed the overall positivity of the main-text conclusion by using a previously established classification scheme [
[13]- McGrath T.A.
- McInnes M.D.F.
- van Es N.
- Leeflang M.M.G.
- Korevaar D.A.
- Bossuyt P.M.M.
Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
]. The overall positivity was classified according to the summary statement in the main-text conclusion about the biomarker's analytical performance or clinical utility. We used the same criteria defined by McGrath et al. [
[13]- McGrath T.A.
- McInnes M.D.F.
- van Es N.
- Leeflang M.M.G.
- Korevaar D.A.
- Bossuyt P.M.M.
Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
] to assess the main-text conclusion as “positive”, “positive with qualifier”, “neutral”, or “negative”. A qualifier attenuates the summary statement or its implication for practice [
[13]- McGrath T.A.
- McInnes M.D.F.
- van Es N.
- Leeflang M.M.G.
- Korevaar D.A.
- Bossuyt P.M.M.
Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
]. Examples include but are not limited to the use of conjunctions such as “may” in the summary statement or statements such as “limited evidence is available” in the same paragraph as the summary statement.
We defined misrepresentation as misreporting and/or distorted presentation of the study results in the title, abstract, or the main text, in a way that could mislead the reader. This category of spin encompassed (1) incorrect presentation of results in the abstract or main-text conclusion, (2) mismatch between results reported in abstract and main text, and (3) mismatch between results reported and the title.
We defined misinterpretation as an interpretation of the study results in the abstract or main-text conclusion that is not consistent and/or is an extrapolation of the actual study results. This category of spin encompassed (4) other purposes of biomarker claimed not prespecified and/or investigated, (5) mismatch between intended aim and abstract or main-text conclusion, (6) other benefits of biomarkers claimed not prespecified and/or investigated, and (7) extrapolation from study participants to a larger or a different population.
We defined “facilitators of spin” as practices that facilitate spin that, but due to various elements, do not allow for a formal assessment and classification as actual spin. For example, in our study, we considered not prespecifying a positivity threshold for continuous biomarker as a facilitator of spin. Stating a threshold value after data collection and analysis may leave room in the representation and interpretation of the data to maximize performance characteristics [
[6]- Ochodo E.A.
- de Haan M.C.
- Reitsma J.B.
- Hooft L.
- Bossuyt P.M.
- Leeflang M.M.
Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
].
In addition to spin and facilitators of spin, we extracted the following information on study characteristics: country, biomarker intended use, author affiliations, conflict disclosures declared, and source of funding. To evaluate which of the factors we identified may be associated with the manifestation of spin, we counted the occurrence of spin corresponding to each of the determinants.
Actual forms of spin, facilitators of spin, and potential determinants of spin were recorded in all studies reporting the performance of the discovered biomarker. Items were scored independently by the first reviewer (M.G.), and all uncertainties were resolved in discussions with a second reviewer (P.M.B. and M.O.).
2.3 Analysis
For each of the items on spin, facilitators of spin, and potential determinants of spin, we report the frequency in our sample of biomarker evaluations, with 95% confidence intervals.
4. Discussion
Our review systematically documented spin in recent clinical studies evaluating performance of biomarkers in ovarian cancer. We identified spin in the title, abstract, result, and conclusion of the main text. Of the 200 studies we evaluated, all but seven reported a positive conclusion about the performance of the biomarker. We found that only one-third of these 200 reports were free of spin, one-third contained one form of spin, and another third contained two or more forms of spin.
The most frequent form of spin was claiming other purposes for the biomarker, outside of the study aim and not investigated, adding that the biomarker could be used for other clinical purposes that were not investigated. The second most frequent form of spin we identified was a mismatch between intended aim and study conclusions, concluding on the biomarker's clinical usefulness, for example, despite the fact that the study had only evaluated classification in a nonclinical setting. These two forms of misinterpretation were more prevalent in the abstract conclusion than the main-text conclusion. The third most frequent form of spin was incorrect presentation of results in the conclusion, with some authors reporting an unjustified positive conclusion about the biomarker's performance, using terms such as “significantly associated” or “highly specific” without providing the test of significance or lacking support by the study results. This form of misrepresentation was more prevalent in the main-text than in the abstract conclusion.
In terms of facilitators of spin, we observed that none of the studies reported a justification for the sample size or discussed any potential harms, and most of the articles did not prespecify a positivity threshold for continuous biomarkers.
Our study had several strengths. A particular feature of our work was that we comprehensively included all markers of ovarian cancer risk, screening, prognosis, or treatment response in body fluid, tissue, or imaging measurements. To evaluate spin in a wide variety of biomarkers and study designs, we optimized our definition of spin in terms of common features that apply to most biomarker studies. We also used a definition of spin that is very broad and encompasses all forms of spin ranging from misreporting, misrepresentation, to linguistic spin, while developing a classification scheme that aims to limits subjectivity.
We acknowledge potential limitations of this study. In our analysis, we focused on mismatches between results presented in the main text and conclusions made in the study abstract or the main text. This definition does not include other forms of generous presentation or interpretation. We did not include specific deficiencies in study design and conduct, data collection, statistical analysis and phrasing of statistical results, or the total body of knowledge about the biomarker to check validity of conclusions made. There may have been other limitations in the study design or conduct that would warrant caution in the conclusions but were not identified by us. Several of the studies had multiple elements, also encompassing a preclinical phase of evaluations. We did not evaluate statements related to the preclinical elements. Similarly, the actual clinical application was not included in our evaluation. For example, a study may claim predictive use of an evaluated biomarker, but the strength of the association may be so limited that the biomarker will not be of value in clinical practice.
Although some of the forms of spin in our analysis could be objectively demonstrated, like a mismatch between results in the main body of the article and results in the abstract, others relied more on interpretation. As in other evaluations of spin, we have tried to minimize the subjectivity of these classifications by having a stepwise development process of the criteria, including multiple reviewers and explicit discussions of scoring results.
Previous studies have documented a high prevalence of spin in published reports of randomized controlled trials, nonrandomized studies, diagnostic test accuracy studies, and systematic reviews [
5- Boutron I.
- Dutton S.
- Ravaud P.
- Altman D.G.
Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
,
6- Ochodo E.A.
- de Haan M.C.
- Reitsma J.B.
- Hooft L.
- Bossuyt P.M.
- Leeflang M.M.
Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
,
7Misrepresentation and distortion of research in biomedical literature.
,
8- Yavchitz A.
- Ravaud P.
- Altman D.G.
- Moher D.
- Hrobjartsson A.
- Lasserson T.
- et al.
A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
,
10- Boutron I.
- Altman D.G.
- Hopewell S.
- Vera-Badillo F.
- Tannock I.
- Ravaud P.
Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
,
13- McGrath T.A.
- McInnes M.D.F.
- van Es N.
- Leeflang M.M.G.
- Korevaar D.A.
- Bossuyt P.M.M.
Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
,
16- Chiu K.
- Grundy Q.
- Bero L.
‘Spin’ in published biomedical literature: a methodological systematic review.
,
17- Lazarus C.
- Haneef R.
- Ravaud P.
- Boutron I.
Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
,
24- Lockyer S.
- Hodgson R.
- Dumville J.C.
- Cullum N.
“Spin” in wound care research: the reporting and interpretation of randomized controlled trials with statistically non-significant primary outcome results or unspecified primary outcomes.
]. The reasons behind biased and incomplete reporting are probably multifaceted and complex. Yavchitz et al. discussed that (1) lack of awareness of scientific standards, (2) naïveté and impressionability of junior researchers, (3) unconscious bias, or (4) in some instances willful intent to positively influence readers may all be factors giving rise to spin in published literature [
[8]- Yavchitz A.
- Ravaud P.
- Altman D.G.
- Moher D.
- Hrobjartsson A.
- Lasserson T.
- et al.
A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
]. The reward system currently used in biomedical science can also be held responsible, as it focuses greatly on quantity of publications rather than quality [
[7]Misrepresentation and distortion of research in biomedical literature.
].
It has previously been shown that spin in articles may indeed hinder the ability of readers to confidently appraise results. Boutron et al [
[10]- Boutron I.
- Altman D.G.
- Hopewell S.
- Vera-Badillo F.
- Tannock I.
- Ravaud P.
Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
] evaluated the impact of spin in the abstract section of articles reporting results in the field of cancer. The studies selected were randomized control trials in cancer with statistically nonsignificant primary outcomes. Boutron observed that clinicians rated the experimental treatment as being more beneficial for abstracts with spin in the conclusion. Scientific articles with spin were also more frequently misrepresented in press releases and news [
[25]- Haneef R.
- Yavchitz A.
- Ravaud P.
- Baron G.
- Oransky I.
- Schwitzer G.
- et al.
Interpretation of health news items reported with or without spin: protocol for a prospective meta-analysis of 16 randomised controlled trials.
].
To detect and limit spin and thus minimize biased and exaggerated reporting of clinical studies, we need to better understand drivers and strategies of spin. Efforts to prevent or reduce biased and incomplete reporting in biomedical research should be undertaken with vigor and in unison, given the intricate complexities that involve multiple players. Researchers and authors, peer reviewers, and journal editors unboundedly share responsibility. The role of institutions and senior researchers is integral in disseminating research integrity and best research practices. Existing educational programs for early career researchers can be enriched by implementing mentoring and training initiatives, making authors aware of forms and facilitators of spin and its impact. Another strategy to consider may be assembling diverse and multidisciplinary teams, including statisticians, to help ensure the rigorous and robust conduct of research methodology. In our review, studies that reported affiliations with statistical departments for at least one author less often had spin.
Despite emerging evidence that use of reporting guidelines is associated with more complete reporting [
[26]- Cobo E.
- Cortes J.
- Ribera J.M.
- Cardellach F.
- Selva-O'Callaghan A.
- Kostov B.
- et al.
Effect of using reporting guidelines during peer review on quality of final manuscripts submitted to a biomedical journal: masked randomised trial.
], journal editors do not explicitly recommend the use of reporting guidelines in the review process [
[27]Are peer reviewers encouraged to use reporting guidelines? A survey of 116 health research journals.
]. In synergy with improving completeness of reporting, guidelines may also help reduce spin, although they are unlikely to fully eliminate it. Example of items in currently existing reporting guidelines that may help reduce spin include item 19 in the REMARK guideline for prognostic studies recommending authors to “interpret the results in the context of the prespecified hypothesis and other relevant studies” in their discussion [
[28]- McShane L.M.
- Altman D.G.
- Sauerbrei W.
- Taube S.E.
- Gion M.
- Clark G.M.
- et al.
REporting recommendations for tumour MARKer prognostic studies (REMARK).
], and item 4 in the STARD guideline for diagnostic accuracy studies recommending authors to “specify the objective and hypothesis” in their introduction [
[29]- Bossuyt P.M.
- Reitsma J.B.
- Bruns D.E.
- Gatsonis C.A.
- Glasziou P.P.
- Irwig L.
- et al.
STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.
]. Expanding currently existing reporting guidelines with items that prompt reviewers to check for manifestation of spin and evaluating the feasibility of the guidelines to limit spin may provide incentives for editors to prompt evidence-based change in practice for the review process.
The development of biomarkers holds great promise for early detection, diagnosis, and treatment of patients with cancer. Yet that promise can only be fulfilled with strong evaluations of the performance of putative markers, complete reporting of the study design and conduct, and a fair and balanced interpretation of study findings. This review of spin in recent evaluations of biomarker performance shows that there is room for improvement.
CRediT authorship contribution statement
Mona Ghannad: Conceptualization, Methodology, Investigation, Formal analysis, Writing - original draft, Writing - review & editing. Maria Olsen: Methodology, Investigation, Writing - review & editing. Isabelle Boutron: Conceptualization, Methodology, Investigation, Writing - review & editing, Supervision. Patrick M. Bossuyt: Conceptualization, Methodology, Investigation, Formal analysis, Writing - original draft, Writing - review & editing, Supervision.
Article info
Publication history
Published online: July 19, 2019
Accepted:
July 15,
2019
Footnotes
Conflict of interest: None.
This work was supported by the European Union's Horizon 2020, European Union research and innovation program under the Marie Sklodowska-Curie grant agreement no. 676207.
The funder/sponsor had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Previous presentation of the manuscript: This study was presented at the International Congress on Peer Review and Scientific Publication 2017, Evidence Live 2018, and the Cochrane Colloquium 2018 meeting.
Copyright
© 2019 The Authors. Published by Elsevier Inc.