Advertisement
Review| Volume 116, P9-17, December 2019

A systematic review finds that spin or interpretation bias is abundant in evaluations of ovarian cancer biomarkers

  • Mona Ghannad
    Correspondence
    Corresponding author. Tel.: +31625583959; fax: +31206912683.
    Affiliations
    Amsterdam UMC, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam Public Health Research Institute, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands

    Université de Paris, CRESS, INSERM, INRA, F-75004 Paris, France
    Search for articles by this author
  • Maria Olsen
    Affiliations
    Amsterdam UMC, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam Public Health Research Institute, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands

    Université de Paris, CRESS, INSERM, INRA, F-75004 Paris, France
    Search for articles by this author
  • Isabelle Boutron
    Affiliations
    Université de Paris, CRESS, INSERM, INRA, F-75004 Paris, France
    Search for articles by this author
  • Patrick M. Bossuyt
    Affiliations
    Amsterdam UMC, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam Public Health Research Institute, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
    Search for articles by this author
Open AccessPublished:July 19, 2019DOI:https://doi.org/10.1016/j.jclinepi.2019.07.011

      Abstract

      Background

      In the scientific literature, “spin” refers to reporting practices that make the study findings appear more favorable than results justify. The practice of “spin” or misrepresentation and overinterpretation may lead to an imbalanced and unjustified optimism in the interpretation of study results about performance of putative biomarkers. We aimed to classify spin (i.e., misrepresentation and overinterpretation of study findings) in recent clinical studies evaluating the performance of biomarkers in ovarian cancer.

      Methods

      We searched PubMed systematically for all evaluations of ovarian cancer biomarkers published in 2015. Studies eligible for inclusion reported the clinical performance of prognostic, predictive, or diagnostic biomarkers.

      Results

      Our search identified 1,026 studies; 326 studies met all eligibility criteria, of which we evaluated the first 200 studies. Of these, 140 (70%) contained one or more form of spin in the title, abstract, or main-text conclusion, exaggerating the performance of the biomarker. The most frequent forms of spin identified were (1) other purposes of biomarker claimed not investigated (65; 32.5%); (2) mismatch between intended aim and conclusion (57; 28.5%); and (3) incorrect presentation of results (40; 20%).

      Conclusion

      Our study provides evidence of misrepresentation and overinterpretation of finding in recent clinical evaluations of ovarian cancer biomarkers.

      Keywords

      What is new?

        Key findings

      • Much research has been dedicated to the discovery of ovarian cancer biomarkers, but few are successfully introduced in clinical care. A number of factors, such as poor study design and bias, have been attributed to the lack of success in identifying clinically relevant biomarkers.
      • In this study, we investigate biased reporting and interpretation in published articles as a potential contributing factor, which has not previously been characterized in ovarian cancer biomarkers. The practice of frequent misrepresentation or overinterpretation of study findings may lead to an imbalanced and unjustified optimism in the interpretation of study results about performance of putative biomarkers.

        What this adds to what was known?

      • We performed a systematic review of 200 recent evaluations of ovarian cancer biomarkers and observed that 70% had at least one form of spin (i.e., misrepresentation or overinterpretation of study findings) in the title, abstract, or main-text conclusion, exaggerating the performance of the biomarker.

        What is the implication and what should change now?

      • This review indicates that biased reporting and interpretation is prevalent in recent clinical evaluations of biomarkers in ovarian cancer. These results indicate a need for strategies to minimize biased reporting and interpretation.

      1. Introduction

      Research in cancer biomarkers has expanded in recent years leading to growing and large literature. However, despite major investments and advances in technology, the current biomarker pipeline is found to be too prone to failures [
      • Ioannidis J.P.A.
      • Bossuyt P.M.M.
      Waste, leaks, and failures in the biomarker pipeline.
      ,
      • Ioannidis J.P.
      Biomarker failures.
      ]. Similarly, much research has been dedicated to the discovery of ovarian cancer biomarkers. However, despite many biomarkers being evaluated, very few have been successfully introduced in clinical care [
      • Diamandis E.P.
      The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?.
      ]. Likely reasons for failure have been documented at each of the stages of biomarker evaluation [
      • Ioannidis J.P.A.
      • Bossuyt P.M.M.
      Waste, leaks, and failures in the biomarker pipeline.
      ,
      • Ioannidis J.P.
      Biomarker failures.
      ,
      • Diamandis E.P.
      The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?.
      ].
      It has been argued that biomarker discovery studies sometimes suffer from weak study designs, limited sample size, and incomplete or biased reporting, which can render them vulnerable to exaggerated interpretation of biomarker performance [
      • Ioannidis J.P.A.
      • Bossuyt P.M.M.
      Waste, leaks, and failures in the biomarker pipeline.
      ,
      • Pepe M.S.
      • Feng Z.D.
      Improving biomarker identification with better designs and reporting.
      ]. Authors may claim favorable performance and clinical effectiveness of biomarkers based on selective reporting of significant findings, or present study results with an overly positive conclusion in the abstract compared with the main text [
      • Boutron I.
      • Dutton S.
      • Ravaud P.
      • Altman D.G.
      Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
      ]. Specific study features could facilitate distorted study results, such as not prespecifying a biomarker threshold or lacking a specific study objective.
      Spin, or misrepresentation and misinterpretation of study findings, not necessarily intentional, is any reporting practice that makes the study findings appear more favorable than the results justify [
      • Ochodo E.A.
      • de Haan M.C.
      • Reitsma J.B.
      • Hooft L.
      • Bossuyt P.M.
      • Leeflang M.M.
      Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
      ,
      • Boutron I.
      • Ravaud P.
      Misrepresentation and distortion of research in biomedical literature.
      ]. Several studies have shown that authors of clinical studies may commonly present and interpret their research findings with a form of spin [
      • Boutron I.
      • Dutton S.
      • Ravaud P.
      • Altman D.G.
      Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
      ,
      • Boutron I.
      • Ravaud P.
      Misrepresentation and distortion of research in biomedical literature.
      ,
      • Yavchitz A.
      • Ravaud P.
      • Altman D.G.
      • Moher D.
      • Hrobjartsson A.
      • Lasserson T.
      • et al.
      A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
      ,
      • Lazarus C.
      • Haneef R.
      • Ravaud P.
      • Hopewell S.
      • Altman D.G.
      • Boutron I.
      Peer reviewers identified spin in manuscripts of nonrandomized studies assessing therapeutic interventions, but their impact on spin in abstract conclusions was limited.
      ,
      • Boutron I.
      • Altman D.G.
      • Hopewell S.
      • Vera-Badillo F.
      • Tannock I.
      • Ravaud P.
      Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
      ]. A consequence of biased representation of results in scientific reports is that the published literature may suggest stronger evidence than is justified [
      • Munafò M.R.
      • Nosek B.A.
      • Bishop D.V.M.
      • Button K.S.
      • Chambers C.D.
      • Percie du Sert N.
      • et al.
      A manifesto for reproducible science.
      ]. Misrepresentation of study findings may also lead to serious implications for patients, health care providers, and policy makers [
      • Macleod M.R.
      • Michie S.
      • Roberts I.
      • Dirnagl U.
      • Chalmers I.
      • Ioannidis J.P.A.
      • et al.
      Biomedical research: increasing value, reducing waste.
      ].
      The primary aim of our study was to evaluate the presence of spin, further categorized as misrepresentation and overinterpretation of study findings, in recent clinical studies evaluating the performance of biomarkers in ovarian cancer. In addition, we also evaluated facilitators of spin (i.e., practices that would facilitate overinterpretation of results), as well as a number of potential determinants of spin.

      2. Methods

      We performed a systematic review to document the prevalence of spin in recent evaluations of the clinical performance of biomarkers in ovarian cancer.

      2.1 Literature search

      MEDLINE was searched through PubMed on December 22, 2016, for all studies evaluating the performance of biomarkers in ovarian cancer published in 2015. The search terms and strategy were developed in collaboration with a medical information specialist (R.S.), using a combination of terms that express the clinical performance of biomarkers in ovarian cancer (Appendix A). We included all markers of ovarian cancer risk, screening, prognosis, or treatment response in body fluid, tissue, or imaging measurements. Reviews, animal studies, and cell line studies were excluded.
      Two authors (M.G. and M.O.) independently reviewed the titles and abstracts to identify potentially eligible articles. Thereafter, full-texts of reports identified as potentially eligible were independently reviewed by the same two authors for inclusion. All disagreements were resolved through discussion or by third party arbitration (P.M.B.). We analyzed the first 200 consecutive studies, ranked according to publication date, to have a sample size comparable with previous systematic reviews of spin [
      • Ochodo E.A.
      • de Haan M.C.
      • Reitsma J.B.
      • Hooft L.
      • Bossuyt P.M.
      • Leeflang M.M.
      Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
      ,
      • McGrath T.A.
      • McInnes M.D.F.
      • van Es N.
      • Leeflang M.M.G.
      • Korevaar D.A.
      • Bossuyt P.M.M.
      Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
      ].

      2.2 Establishing criteria and data extraction

      Biomarker studies in ovarian cancer vary by study design, biomarker clinical application, type and number of tests evaluated [
      • Horvath A.R.
      • Lord S.J.
      • StJohn A.
      • Sandberg S.
      • Cobbaert C.M.
      • Lorenz S.
      • et al.
      From biomarkers to medical tests: the changing landscape of test evaluation.
      ,
      • Pavlou M.P.
      • Diamandis E.P.
      • Blasutig I.M.
      The long journey of cancer biomarkers from the bench to the clinic.
      ]. Within the evaluation process, several components can be assessed, such as analytical performance, clinical performance, clinical effectiveness, cost-effectiveness, and all other consequences beyond clinical effectiveness and cost-effectiveness. We developed a definition of spin that encompassed common features applicable to all the various biomarker types and study designs. We defined spin as reporting practices that make the clinical performance of markers look more favorable than results justify. This definition of spin was based on criteria extracted from key articles on misrepresentation and misinterpretation of study findings [
      • Boutron I.
      • Dutton S.
      • Ravaud P.
      • Altman D.G.
      Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
      ,
      • Ochodo E.A.
      • de Haan M.C.
      • Reitsma J.B.
      • Hooft L.
      • Bossuyt P.M.
      • Leeflang M.M.
      Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
      ,
      • Boutron I.
      • Ravaud P.
      Misrepresentation and distortion of research in biomedical literature.
      ,
      • Lazarus C.
      • Haneef R.
      • Ravaud P.
      • Hopewell S.
      • Altman D.G.
      • Boutron I.
      Peer reviewers identified spin in manuscripts of nonrandomized studies assessing therapeutic interventions, but their impact on spin in abstract conclusions was limited.
      ,
      • Boutron I.
      • Altman D.G.
      • Hopewell S.
      • Vera-Badillo F.
      • Tannock I.
      • Ravaud P.
      Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
      ,
      • McGrath T.A.
      • McInnes M.D.F.
      • van Es N.
      • Leeflang M.M.G.
      • Korevaar D.A.
      • Bossuyt P.M.M.
      Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
      ,
      • Chiu K.
      • Grundy Q.
      • Bero L.
      ‘Spin’ in published biomedical literature: a methodological systematic review.
      ,
      • Lazarus C.
      • Haneef R.
      • Ravaud P.
      • Boutron I.
      Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
      ].
      To evaluate the frequency of spin, we established a preliminary list incorporating previously established items that represent spin as well [
      • Boutron I.
      • Dutton S.
      • Ravaud P.
      • Altman D.G.
      Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
      ,
      • Ochodo E.A.
      • de Haan M.C.
      • Reitsma J.B.
      • Hooft L.
      • Bossuyt P.M.
      • Leeflang M.M.
      Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
      ,
      • Boutron I.
      • Ravaud P.
      Misrepresentation and distortion of research in biomedical literature.
      ,
      • Chiu K.
      • Grundy Q.
      • Bero L.
      ‘Spin’ in published biomedical literature: a methodological systematic review.
      ,
      • Lazarus C.
      • Haneef R.
      • Ravaud P.
      • Boutron I.
      Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
      ]. We then established a preliminary list of criteria to evaluate the frequency of spin and optimized our criteria through a gradual data extraction process. A set of 20 articles were fully verified by a second reviewer (M.O.), and points of disagreements were discussed with a third investigator (I.B. and P.M.B.) to fine-tune the scoring criteria and clarify the coding scheme. Through this process and discussions that ensued, a final list of items was established with content experts (P.M.B. and I.B.), categorizing items as representing “spin” or “facilitator of spin.” Each of the categories encompassed several forms of spin.
      We further classified spin into two categories: “misrepresentation” and “misinterpretation”, to distinguish between distorted presentation and incorrect interpretation of findings with special focus on the abstract and main-text conclusions. As the presence of a positive conclusion is interdependent with the items that represent spin, we assessed the overall positivity of the main-text conclusion by using a previously established classification scheme [
      • McGrath T.A.
      • McInnes M.D.F.
      • van Es N.
      • Leeflang M.M.G.
      • Korevaar D.A.
      • Bossuyt P.M.M.
      Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
      ]. The overall positivity was classified according to the summary statement in the main-text conclusion about the biomarker's analytical performance or clinical utility. We used the same criteria defined by McGrath et al. [
      • McGrath T.A.
      • McInnes M.D.F.
      • van Es N.
      • Leeflang M.M.G.
      • Korevaar D.A.
      • Bossuyt P.M.M.
      Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
      ] to assess the main-text conclusion as “positive”, “positive with qualifier”, “neutral”, or “negative”. A qualifier attenuates the summary statement or its implication for practice [
      • McGrath T.A.
      • McInnes M.D.F.
      • van Es N.
      • Leeflang M.M.G.
      • Korevaar D.A.
      • Bossuyt P.M.M.
      Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
      ]. Examples include but are not limited to the use of conjunctions such as “may” in the summary statement or statements such as “limited evidence is available” in the same paragraph as the summary statement.
      We defined misrepresentation as misreporting and/or distorted presentation of the study results in the title, abstract, or the main text, in a way that could mislead the reader. This category of spin encompassed (1) incorrect presentation of results in the abstract or main-text conclusion, (2) mismatch between results reported in abstract and main text, and (3) mismatch between results reported and the title.
      We defined misinterpretation as an interpretation of the study results in the abstract or main-text conclusion that is not consistent and/or is an extrapolation of the actual study results. This category of spin encompassed (4) other purposes of biomarker claimed not prespecified and/or investigated, (5) mismatch between intended aim and abstract or main-text conclusion, (6) other benefits of biomarkers claimed not prespecified and/or investigated, and (7) extrapolation from study participants to a larger or a different population.
      We defined “facilitators of spin” as practices that facilitate spin that, but due to various elements, do not allow for a formal assessment and classification as actual spin. For example, in our study, we considered not prespecifying a positivity threshold for continuous biomarker as a facilitator of spin. Stating a threshold value after data collection and analysis may leave room in the representation and interpretation of the data to maximize performance characteristics [
      • Ochodo E.A.
      • de Haan M.C.
      • Reitsma J.B.
      • Hooft L.
      • Bossuyt P.M.
      • Leeflang M.M.
      Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
      ].
      In addition to spin and facilitators of spin, we extracted the following information on study characteristics: country, biomarker intended use, author affiliations, conflict disclosures declared, and source of funding. To evaluate which of the factors we identified may be associated with the manifestation of spin, we counted the occurrence of spin corresponding to each of the determinants.
      Actual forms of spin, facilitators of spin, and potential determinants of spin were recorded in all studies reporting the performance of the discovered biomarker. Items were scored independently by the first reviewer (M.G.), and all uncertainties were resolved in discussions with a second reviewer (P.M.B. and M.O.).

      2.3 Analysis

      For each of the items on spin, facilitators of spin, and potential determinants of spin, we report the frequency in our sample of biomarker evaluations, with 95% confidence intervals.

      3. Results

      3.1 Search results

      Our search identified 1,026 citations in PubMed. After title and abstract screening, 516 citations were selected for full-text evaluation. Of these, 326 studies met all eligibility criteria, and the first 200 studies, ranked according to publication date, were included in our analysis (Fig. 1).

      3.2 Characteristics of included studies

      A description of included studies is presented in Table 1. The studies originated from a total of 32 countries, with the majority of the studies coming from China (n = 69, 34.5%) and USA (n = 41, 20.5%). The remaining 30 countries had a distribution range of 1 to 14 articles per country. The studies were published in 94 journals in total (Appendix B).
      Table 1Study Characteristics
      CharacteristicNo. (%) (all studies n = 200)
      Number of journals94
      Origin
       Asia101 (51%)
       North America51 (26%)
       Europe39 (20%)
       Other (Australia, Brazil, Chile)9 (5%)
      Biomarker clinical application
       Prognosis89 (45%)
       Diagnosis40 (20%)
       Prediction of therapeutic response26 (13%)
       Risk susceptibility, monitoring, screening17 (9%)
       Multiple28 (14%)
      Author affiliations
       Clinical department only194 (97%)
       Clinical and either statistical department or bioinformatics/computational biology (*affiliation with statistical department or bioinformatics/computational biology are not mutually exclusive)34 (17%)
      Positivity of conclusions
       Positive113 (57%)
       Positive with qualifier80 (40%)
       Negative5 (3%)
       Neutral2 (1%)
      Conflict disclosure
       No151 (76%)
       Not reported38 (19%)
       Yes11 (6%)
      Funding source
       Nonprofit135 (68%)
       Not reported53 (27%)
       No funding6 (3%)
       For profit4 (2%)
       Mix (for profit and nonprofit)2 (1%)
      Of all the studies evaluated in the included articles, prognostic (n = 89, 44.5%) and diagnostic (n = 40, 20%) markers comprised the largest group. Authors of almost all included studies had an affiliation with a clinical department (n = 194, 97%), but only 34 of these (17.5%) had one or more authors affiliated with a statistical or bioinformatics department.
      Nearly all the included studies (n = 193, 96.5%) reported a positive conclusion in the main text, with only seven studies (3.5%) reporting a negative or neutral conclusion. Of the 193 studies with a positive conclusion, 80 studies had a qualifier, stating a positive summary statement with a qualifier, for example, with a conjunction such as “may”, and thereby attenuating the statement. Eleven studies (5.5%) declared a conflict of interest, 38 (19%) did not report if they had a conflict of interest. The funding source was mainly nonprofit (n = 135, 67.5%). However, 53 of the included studies (27%) did not report source of funding.

      3.3 Actual forms of spin

      In our 200 analyzed studies, 140 (70%) contained one or more forms of spin; 75 had two or more forms of spin. Sixty studies (30%) had no form of spin in the article, based on our criteria. Table 2 lists the prevalence for each form of spin (i.e., misrepresentation or misinterpretation) from the articles in our set, with examples presented in Appendix D.
      Table 2Actual forms of spin in clinical studies evaluating performance of biomarkers in ovarian cancer
      Category of spinForm of spinCriteriaSpin frequency, n = 200; n (% [95% CI])
      Misrepresentation1. Incorrect presentation of results in the abstract or main-text conclusionAbstract conclusion OR main-text conclusion for BM's clinical performance is not in accordance with or is stronger than results justify.

      Actual spin if the following:
      • a.
        Exaggerating the performance of the BM in the conclusion despite low-performance measures reported in the results
      • b.
        Claiming effect of the BM despite statistically nonsignificant results
      • c.
        Claiming effect despite not providing imprecision or statistical test (confidence interval or P-values) between different biomarker models tested or patient groups (subgroups)
      Total: 40 (20% [15–26%])

      Frequency in abstract conclusion:

      14 (7% [4–12%])

      Frequency in main-text conclusion:

      37 (18.5% [14–25%])
      2. Mismatch between results reported in abstract and main textResults reported in the abstract are not in accordance with results reported in main text.

      Actual spin if the following:
      • a.
        Results reported in the abstract contains statement in which statistical significance is claimed, despite not providing imprecision or test of significant (CI or P-values) in results reported in the main text
      • b.
        Selective reporting of statistically significant outcomes in the abstract compared to the results reported in the main text
      • c.
        Results reported in the abstract that do not match results provided in the main text
      33 (16.5% [12–23%])
      3. Mismatch between results reported and the titleThe title contains wording misrepresenting BM's clinical performance compared with results in the main text.11 (5.5% [3–10%])
      Misinterpretation4. Other purposes of biomarker claimed not prespecified and/or investigatedAbstract conclusion OR main text conclusion contains statement suggesting BM purposes not prespecified and/or investigated.Total: 65 (32.5% [26–40%])

      Frequency in abstract conclusion:

      36 (20.5% [13–24%])

      Frequency in main text conclusion:

      60 (30% [24–37%])
      5. Mismatch between intended aim and abstract or main-text conclusionAbstract conclusion OR main-text conclusion for BM's clinical performance is stronger than study design.

      Actual spin if the following:
      • a.
        The main-text conclusion contains statement in which BM utility is claimed despite not evaluating clinical effectiveness (i.e., useful)
      • b.
        The main-text conclusion contains statement in which BM performance improvement is claimed despite not evaluating incremental measures (i.e., improve)
      • c.
        The main-text conclusion contains statement that uses causal language for BM(s) being assessed despite the use of a nonrandomized design
      Total: 57 (28.5% [23–35%])

      Frequency in abstract conclusion:

      41 (20.5% [15–27%])

      Frequency in main-text conclusion:

      31 (15.5% [11–21%])
      6. Other benefits of BM claimed not prespecified and/or investigatedThe main-text conclusion contains statement claiming BM benefits not prespecified and/or investigated.10 (5% [3–9%])
      7. Extrapolation from study participants to a larger or a different populationThe main-text conclusion contains statement that extrapolates BM's clinical performance to a larger or a different population, not supported by recruited subjects.10 (5% [3–9%])
      Abbreviations: BM, biomarker; HR, hazard ratio; OS, overall survival; PFS, progression-free survival.
      Note: We evaluated all results reported in the abstract and main text of the article, excluding supplementary material.
      We identified incorrect presentation of results in abstract or main-text conclusion in 40 study reports (20%). We observed this more frequently in the main-text conclusion (n = 37, 18.5%) than in the abstract conclusion (n = 14, 7%). These were reports in which a positive conclusion was made about the biomarker that was not supported by the study results, or not accompanied by a test for statistical significance or an appropriate expression of precision, such as 95% confidence intervals. Examples were a study that claimed a multivariable algorithm had been validated, despite poor results (the study presents positive results on biomarkers, but these were not included in the algorithm), and a study that claimed a “high specificity”, while the corresponding estimate was only 58% [
      • Dorn J.
      • Bronger H.
      • Kates R.
      • Slotta-Huspenina J.
      • Schmalfeldt B.
      • Kiechle M.
      • et al.
      OVSCORE - a validated score to identify ovarian cancer patients not suitable for primary surgery.
      ,
      • Masoumi-Moghaddam S.
      • Amini A.
      • Wei A.Q.
      • Robertson G.
      • Morris D.L.
      Sprouty 2 protein, but not Sprouty 4, is an independent prognostic biomarker for human epithelial ovarian cancer.
      ].
      Several studies claimed superiority in performance in the absence of tests for statistical significance [
      • Wilailak S.
      • Chan K.K.
      • Chen C.A.
      • Nam J.H.
      • Ochiai K.
      • Aw T.C.
      • et al.
      Distinguishing benign from malignant pelvic mass utilizing an algorithm with HE4, menopausal status, and ultrasound findings.
      ,
      • Fujiwara H.
      • Suzuki M.
      • Takeshima N.
      • Takizawa K.
      • Kimura E.
      • Nakanishi T.
      • et al.
      Evaluation of human epididymis protein 4 (HE4) and risk of ovarian malignancy algorithm (ROMA) as diagnostic tools of type I and type II epithelial ovarian cancer in Japanese women.
      ]. In 33 study reports (16.5%), there was a mismatch in results reported in the abstract and the main text. The most frequent examples were studies that selectively reported findings in the abstract, including only the most positive or statistically significant results in the study abstract. In few studies, we observed a mismatch between results reported in abstract and results reported in the main text. In 11 articles (5.5%), we observed a mismatch in the title.
      Apart from these forms of misrepresentation of study findings, we also looked at forms of misinterpretation. In 65 study reports (32.5%), biomarker purposes were suggested, which had not been investigated in the actual study. We also observed this more frequently in the main-text conclusion (n = 60, 30%) than in the abstract conclusion (n = 36, 20.5%). An example was a study that claimed in the conclusion of the abstract that a biomarker “showed strong promise as a diagnostic tool for large-scale screening”, although the marker had only been evaluated in a diagnostic setting, with symptomatic patients [
      • Shadfan B.H.
      • Simmons A.R.
      • Simmons G.W.
      • Ho A.
      • Wong J.
      • Lu K.H.
      • et al.
      A multiplexable, microfluidic platform for the rapid quantitation of a biomarker panel for early ovarian cancer detection at the point-of-care.
      ].
      In addition, we identified a mismatch between the intended aim of the biomarker and one of the conclusions of the study report in 57 cases (28.5%). This form of misinterpretation was also more frequently observed in the abstract section (n = 41, 20.5%) than the main text section (n = 31, 15.5%). A typical example was a claim about clinical usefulness in a study where the report only included an expression of performance in a nonclinical setting, discriminating between cases and noncases, based on the biomarker [
      • Shadfan B.H.
      • Simmons A.R.
      • Simmons G.W.
      • Ho A.
      • Wong J.
      • Lu K.H.
      • et al.
      A multiplexable, microfluidic platform for the rapid quantitation of a biomarker panel for early ovarian cancer detection at the point-of-care.
      ]. In 10 studies (5%), biomarker benefits were claimed that had not been evaluated, such as a reduction in health care costs. In 10 articles (5%), there was an unsupported extrapolation from the study group to a different population. An example was a study that concluded that a spectroscopy technique was useful for the early detection of biomarker, although the study had only evaluated patients undergoing surgery [
      • Lima K.M.
      • Gajjar K.B.
      • Martin-Hirsch P.L.
      • Martin F.L.
      Segregation of ovarian cancer stage exploiting spectral biomarkers derived from blood plasma or serum analysis: ATR-FTIR spectroscopy coupled with variable selection methods.
      ].

      3.4 Facilitators of spin

      Details of our analysis of potential facilitators of spin are presented in Table 3. Of the 200 analyzed studies, none reported a sample size justification or any potential harms. Only half of the studies prespecified a positivity threshold for the continuous biomarker evaluated.
      Table 3Facilitators of spin in clinical studies evaluating performance of biomarkers in ovarian cancer
      Potential facilitators of spinSpin frequency, n = 200; n (% [95% CI])
      Not stating sample size calculations200 (100% [98–100%])
      Not mentioning potential harms200 (100% [98–100%])
      Not prespecifying a positivity threshold for continuous biomarker84/164
      164 articles included evaluation of continuous biomarkers.
      (51.2% [43–59%])
      Incomplete or not reporting imprecision or statistical test for data shown26 (13% [9–19%])
      Study objective not reported or unclear24 (12% [8–18%])
      a 164 articles included evaluation of continuous biomarkers.
      Table 4Potential determinants of spin
      DeterminantNo. of articles with determinantNumber of articles with determinant and occurrence of spinNumber of articles with determinant and overall occurrence of spin; n (% [95% CI])
      1 occurrence of spin2 occurrences of spin>2 occurrences of spin
      Origin
       Asia (including Turkey and Israel)10139201978 (77% [68–85%])
       North America511313632 (63% [48–76%])
       Europe3999321 (54% [37–70%])
       Other (Australia, Brazil, Chile)94329 (100% [63–100%])
      Biomarker clinical application
       Prognosis893614959 (66% [56–76%])
       Diagnosis404141432 (80% [64–90%])
       Prediction of therapeutic response26114318 (69% [48–85%])
       Risk susceptibility, monitoring, screening174509 (53% [29–76%])
       Multiple28108422 (79% [59–91%])
      Affiliation between clinical department and statistical or bioinformatics department
       No160533727117 (73% [65–80%])
       Yes34126220 (59% [41–75%])
      Conflict of interest
       No151512921101 (67% [59–74%])
       Not reported381212731 (82% [65–92%])
       Yes112428 (73% [39–93%])
      Funding source
       Nonprofit13546252192 (68% [60–76%])
       Not reported531813738 (72% [57–83%])
       No funding60314 (67% [24–94%])
       For profit40314 (100% [40–100%])
       Mix (nonprofit and for profit)21102 (100% [20–100%])

      3.5 Potential determinants of spin

      We investigated potential determinants of spin in the 200 articles in our data set (Table 4). Articles from China (75%) and Japan (86%) were more frequently observed to have spin (Appendix C). Diagnostic accuracy studies (80%) and articles that reported multiple clinical utility of the biomarker (79%) were more often associated with spin. Studies that reported affiliations with a statistical or bioinformatics department (59%) were less likely to have spin in the report than studies that did not report an affiliation with a statistical or bioinformatics department (73%). Studies that failed to report whether there was a conflict of interest (82%) more often had spin than studies that declared no conflict of interest (67%).

      4. Discussion

      Our review systematically documented spin in recent clinical studies evaluating performance of biomarkers in ovarian cancer. We identified spin in the title, abstract, result, and conclusion of the main text. Of the 200 studies we evaluated, all but seven reported a positive conclusion about the performance of the biomarker. We found that only one-third of these 200 reports were free of spin, one-third contained one form of spin, and another third contained two or more forms of spin.
      The most frequent form of spin was claiming other purposes for the biomarker, outside of the study aim and not investigated, adding that the biomarker could be used for other clinical purposes that were not investigated. The second most frequent form of spin we identified was a mismatch between intended aim and study conclusions, concluding on the biomarker's clinical usefulness, for example, despite the fact that the study had only evaluated classification in a nonclinical setting. These two forms of misinterpretation were more prevalent in the abstract conclusion than the main-text conclusion. The third most frequent form of spin was incorrect presentation of results in the conclusion, with some authors reporting an unjustified positive conclusion about the biomarker's performance, using terms such as “significantly associated” or “highly specific” without providing the test of significance or lacking support by the study results. This form of misrepresentation was more prevalent in the main-text than in the abstract conclusion.
      In terms of facilitators of spin, we observed that none of the studies reported a justification for the sample size or discussed any potential harms, and most of the articles did not prespecify a positivity threshold for continuous biomarkers.
      Our study had several strengths. A particular feature of our work was that we comprehensively included all markers of ovarian cancer risk, screening, prognosis, or treatment response in body fluid, tissue, or imaging measurements. To evaluate spin in a wide variety of biomarkers and study designs, we optimized our definition of spin in terms of common features that apply to most biomarker studies. We also used a definition of spin that is very broad and encompasses all forms of spin ranging from misreporting, misrepresentation, to linguistic spin, while developing a classification scheme that aims to limits subjectivity.
      We acknowledge potential limitations of this study. In our analysis, we focused on mismatches between results presented in the main text and conclusions made in the study abstract or the main text. This definition does not include other forms of generous presentation or interpretation. We did not include specific deficiencies in study design and conduct, data collection, statistical analysis and phrasing of statistical results, or the total body of knowledge about the biomarker to check validity of conclusions made. There may have been other limitations in the study design or conduct that would warrant caution in the conclusions but were not identified by us. Several of the studies had multiple elements, also encompassing a preclinical phase of evaluations. We did not evaluate statements related to the preclinical elements. Similarly, the actual clinical application was not included in our evaluation. For example, a study may claim predictive use of an evaluated biomarker, but the strength of the association may be so limited that the biomarker will not be of value in clinical practice.
      Although some of the forms of spin in our analysis could be objectively demonstrated, like a mismatch between results in the main body of the article and results in the abstract, others relied more on interpretation. As in other evaluations of spin, we have tried to minimize the subjectivity of these classifications by having a stepwise development process of the criteria, including multiple reviewers and explicit discussions of scoring results.
      Previous studies have documented a high prevalence of spin in published reports of randomized controlled trials, nonrandomized studies, diagnostic test accuracy studies, and systematic reviews [
      • Boutron I.
      • Dutton S.
      • Ravaud P.
      • Altman D.G.
      Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
      ,
      • Ochodo E.A.
      • de Haan M.C.
      • Reitsma J.B.
      • Hooft L.
      • Bossuyt P.M.
      • Leeflang M.M.
      Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
      ,
      • Boutron I.
      • Ravaud P.
      Misrepresentation and distortion of research in biomedical literature.
      ,
      • Yavchitz A.
      • Ravaud P.
      • Altman D.G.
      • Moher D.
      • Hrobjartsson A.
      • Lasserson T.
      • et al.
      A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
      ,
      • Boutron I.
      • Altman D.G.
      • Hopewell S.
      • Vera-Badillo F.
      • Tannock I.
      • Ravaud P.
      Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
      ,
      • McGrath T.A.
      • McInnes M.D.F.
      • van Es N.
      • Leeflang M.M.G.
      • Korevaar D.A.
      • Bossuyt P.M.M.
      Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
      ,
      • Chiu K.
      • Grundy Q.
      • Bero L.
      ‘Spin’ in published biomedical literature: a methodological systematic review.
      ,
      • Lazarus C.
      • Haneef R.
      • Ravaud P.
      • Boutron I.
      Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
      ,
      • Lockyer S.
      • Hodgson R.
      • Dumville J.C.
      • Cullum N.
      “Spin” in wound care research: the reporting and interpretation of randomized controlled trials with statistically non-significant primary outcome results or unspecified primary outcomes.
      ]. The reasons behind biased and incomplete reporting are probably multifaceted and complex. Yavchitz et al. discussed that (1) lack of awareness of scientific standards, (2) naïveté and impressionability of junior researchers, (3) unconscious bias, or (4) in some instances willful intent to positively influence readers may all be factors giving rise to spin in published literature [
      • Yavchitz A.
      • Ravaud P.
      • Altman D.G.
      • Moher D.
      • Hrobjartsson A.
      • Lasserson T.
      • et al.
      A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
      ]. The reward system currently used in biomedical science can also be held responsible, as it focuses greatly on quantity of publications rather than quality [
      • Boutron I.
      • Ravaud P.
      Misrepresentation and distortion of research in biomedical literature.
      ].
      It has previously been shown that spin in articles may indeed hinder the ability of readers to confidently appraise results. Boutron et al [
      • Boutron I.
      • Altman D.G.
      • Hopewell S.
      • Vera-Badillo F.
      • Tannock I.
      • Ravaud P.
      Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
      ] evaluated the impact of spin in the abstract section of articles reporting results in the field of cancer. The studies selected were randomized control trials in cancer with statistically nonsignificant primary outcomes. Boutron observed that clinicians rated the experimental treatment as being more beneficial for abstracts with spin in the conclusion. Scientific articles with spin were also more frequently misrepresented in press releases and news [
      • Haneef R.
      • Yavchitz A.
      • Ravaud P.
      • Baron G.
      • Oransky I.
      • Schwitzer G.
      • et al.
      Interpretation of health news items reported with or without spin: protocol for a prospective meta-analysis of 16 randomised controlled trials.
      ].
      To detect and limit spin and thus minimize biased and exaggerated reporting of clinical studies, we need to better understand drivers and strategies of spin. Efforts to prevent or reduce biased and incomplete reporting in biomedical research should be undertaken with vigor and in unison, given the intricate complexities that involve multiple players. Researchers and authors, peer reviewers, and journal editors unboundedly share responsibility. The role of institutions and senior researchers is integral in disseminating research integrity and best research practices. Existing educational programs for early career researchers can be enriched by implementing mentoring and training initiatives, making authors aware of forms and facilitators of spin and its impact. Another strategy to consider may be assembling diverse and multidisciplinary teams, including statisticians, to help ensure the rigorous and robust conduct of research methodology. In our review, studies that reported affiliations with statistical departments for at least one author less often had spin.
      Despite emerging evidence that use of reporting guidelines is associated with more complete reporting [
      • Cobo E.
      • Cortes J.
      • Ribera J.M.
      • Cardellach F.
      • Selva-O'Callaghan A.
      • Kostov B.
      • et al.
      Effect of using reporting guidelines during peer review on quality of final manuscripts submitted to a biomedical journal: masked randomised trial.
      ], journal editors do not explicitly recommend the use of reporting guidelines in the review process [
      • Hirst A.
      • Altman D.G.
      Are peer reviewers encouraged to use reporting guidelines? A survey of 116 health research journals.
      ]. In synergy with improving completeness of reporting, guidelines may also help reduce spin, although they are unlikely to fully eliminate it. Example of items in currently existing reporting guidelines that may help reduce spin include item 19 in the REMARK guideline for prognostic studies recommending authors to “interpret the results in the context of the prespecified hypothesis and other relevant studies” in their discussion [
      • McShane L.M.
      • Altman D.G.
      • Sauerbrei W.
      • Taube S.E.
      • Gion M.
      • Clark G.M.
      • et al.
      REporting recommendations for tumour MARKer prognostic studies (REMARK).
      ], and item 4 in the STARD guideline for diagnostic accuracy studies recommending authors to “specify the objective and hypothesis” in their introduction [
      • Bossuyt P.M.
      • Reitsma J.B.
      • Bruns D.E.
      • Gatsonis C.A.
      • Glasziou P.P.
      • Irwig L.
      • et al.
      STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.
      ]. Expanding currently existing reporting guidelines with items that prompt reviewers to check for manifestation of spin and evaluating the feasibility of the guidelines to limit spin may provide incentives for editors to prompt evidence-based change in practice for the review process.
      The development of biomarkers holds great promise for early detection, diagnosis, and treatment of patients with cancer. Yet that promise can only be fulfilled with strong evaluations of the performance of putative markers, complete reporting of the study design and conduct, and a fair and balanced interpretation of study findings. This review of spin in recent evaluations of biomarker performance shows that there is room for improvement.

      CRediT authorship contribution statement

      Mona Ghannad: Conceptualization, Methodology, Investigation, Formal analysis, Writing - original draft, Writing - review & editing. Maria Olsen: Methodology, Investigation, Writing - review & editing. Isabelle Boutron: Conceptualization, Methodology, Investigation, Writing - review & editing, Supervision. Patrick M. Bossuyt: Conceptualization, Methodology, Investigation, Formal analysis, Writing - original draft, Writing - review & editing, Supervision.

      Acknowledgments

      The authors thank Rene Spijker for developing the search strategy and Simon Boerstra for providing help with data analysis.

      Supplementary data

      References

        • Ioannidis J.P.A.
        • Bossuyt P.M.M.
        Waste, leaks, and failures in the biomarker pipeline.
        Clin Chem. 2017; 63: 963-972
        • Ioannidis J.P.
        Biomarker failures.
        Clin Chem. 2013; 59: 202-204
        • Diamandis E.P.
        The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?.
        BMC Med. 2012; 10: 87
        • Pepe M.S.
        • Feng Z.D.
        Improving biomarker identification with better designs and reporting.
        Clin Chem. 2011; 57: 1093-1095
        • Boutron I.
        • Dutton S.
        • Ravaud P.
        • Altman D.G.
        Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.
        JAMA. 2010; 303: 2058-2064
        • Ochodo E.A.
        • de Haan M.C.
        • Reitsma J.B.
        • Hooft L.
        • Bossuyt P.M.
        • Leeflang M.M.
        Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin”.
        Radiology. 2013; 267: 581-588
        • Boutron I.
        • Ravaud P.
        Misrepresentation and distortion of research in biomedical literature.
        Proc Natl Acad Sci U S A. 2018; 115: 2613-2619
        • Yavchitz A.
        • Ravaud P.
        • Altman D.G.
        • Moher D.
        • Hrobjartsson A.
        • Lasserson T.
        • et al.
        A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity.
        J Clin Epidemiol. 2016; 75: 56-65
        • Lazarus C.
        • Haneef R.
        • Ravaud P.
        • Hopewell S.
        • Altman D.G.
        • Boutron I.
        Peer reviewers identified spin in manuscripts of nonrandomized studies assessing therapeutic interventions, but their impact on spin in abstract conclusions was limited.
        J Clin Epidemiol. 2016; 77: 44-51
        • Boutron I.
        • Altman D.G.
        • Hopewell S.
        • Vera-Badillo F.
        • Tannock I.
        • Ravaud P.
        Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial.
        J Clin Oncol. 2014; 32: 4120-4126
        • Munafò M.R.
        • Nosek B.A.
        • Bishop D.V.M.
        • Button K.S.
        • Chambers C.D.
        • Percie du Sert N.
        • et al.
        A manifesto for reproducible science.
        Nat Hum Behav. 2017; 1: 0021
        • Macleod M.R.
        • Michie S.
        • Roberts I.
        • Dirnagl U.
        • Chalmers I.
        • Ioannidis J.P.A.
        • et al.
        Biomedical research: increasing value, reducing waste.
        Lancet. 2014; 383: 101-104
        • McGrath T.A.
        • McInnes M.D.F.
        • van Es N.
        • Leeflang M.M.G.
        • Korevaar D.A.
        • Bossuyt P.M.M.
        Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies.
        Clin Chem. 2017; 63: 1353-1362
        • Horvath A.R.
        • Lord S.J.
        • StJohn A.
        • Sandberg S.
        • Cobbaert C.M.
        • Lorenz S.
        • et al.
        From biomarkers to medical tests: the changing landscape of test evaluation.
        Clin Chim Acta. 2014; 427: 49-57
        • Pavlou M.P.
        • Diamandis E.P.
        • Blasutig I.M.
        The long journey of cancer biomarkers from the bench to the clinic.
        Clin Chem. 2013; 59: 147-157
        • Chiu K.
        • Grundy Q.
        • Bero L.
        ‘Spin’ in published biomedical literature: a methodological systematic review.
        PLoS Biol. 2017; 15: e2002173
        • Lazarus C.
        • Haneef R.
        • Ravaud P.
        • Boutron I.
        Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention.
        BMC Med Res Methodol. 2015; 15: 85
        • Dorn J.
        • Bronger H.
        • Kates R.
        • Slotta-Huspenina J.
        • Schmalfeldt B.
        • Kiechle M.
        • et al.
        OVSCORE - a validated score to identify ovarian cancer patients not suitable for primary surgery.
        Oncol Lett. 2015; 9: 418-424
        • Masoumi-Moghaddam S.
        • Amini A.
        • Wei A.Q.
        • Robertson G.
        • Morris D.L.
        Sprouty 2 protein, but not Sprouty 4, is an independent prognostic biomarker for human epithelial ovarian cancer.
        Int J Cancer. 2015; 137: 560-570
        • Wilailak S.
        • Chan K.K.
        • Chen C.A.
        • Nam J.H.
        • Ochiai K.
        • Aw T.C.
        • et al.
        Distinguishing benign from malignant pelvic mass utilizing an algorithm with HE4, menopausal status, and ultrasound findings.
        J Gynecol Oncol. 2015; 26: 46-53
        • Fujiwara H.
        • Suzuki M.
        • Takeshima N.
        • Takizawa K.
        • Kimura E.
        • Nakanishi T.
        • et al.
        Evaluation of human epididymis protein 4 (HE4) and risk of ovarian malignancy algorithm (ROMA) as diagnostic tools of type I and type II epithelial ovarian cancer in Japanese women.
        Tumour Biol. 2015; 36: 1045-1053
        • Shadfan B.H.
        • Simmons A.R.
        • Simmons G.W.
        • Ho A.
        • Wong J.
        • Lu K.H.
        • et al.
        A multiplexable, microfluidic platform for the rapid quantitation of a biomarker panel for early ovarian cancer detection at the point-of-care.
        Cancer Prev Res (Phila). 2015; 8: 37-48
        • Lima K.M.
        • Gajjar K.B.
        • Martin-Hirsch P.L.
        • Martin F.L.
        Segregation of ovarian cancer stage exploiting spectral biomarkers derived from blood plasma or serum analysis: ATR-FTIR spectroscopy coupled with variable selection methods.
        Biotechnol Prog. 2015; 31: 832-839
        • Lockyer S.
        • Hodgson R.
        • Dumville J.C.
        • Cullum N.
        “Spin” in wound care research: the reporting and interpretation of randomized controlled trials with statistically non-significant primary outcome results or unspecified primary outcomes.
        Trials. 2013; 14: 371
        • Haneef R.
        • Yavchitz A.
        • Ravaud P.
        • Baron G.
        • Oransky I.
        • Schwitzer G.
        • et al.
        Interpretation of health news items reported with or without spin: protocol for a prospective meta-analysis of 16 randomised controlled trials.
        BMJ Open. 2017; 7: e017425
        • Cobo E.
        • Cortes J.
        • Ribera J.M.
        • Cardellach F.
        • Selva-O'Callaghan A.
        • Kostov B.
        • et al.
        Effect of using reporting guidelines during peer review on quality of final manuscripts submitted to a biomedical journal: masked randomised trial.
        BMJ. 2011; 343: d6783
        • Hirst A.
        • Altman D.G.
        Are peer reviewers encouraged to use reporting guidelines? A survey of 116 health research journals.
        PLoS One. 2012; 7: e35621
        • McShane L.M.
        • Altman D.G.
        • Sauerbrei W.
        • Taube S.E.
        • Gion M.
        • Clark G.M.
        • et al.
        REporting recommendations for tumour MARKer prognostic studies (REMARK).
        Br J Cancer. 2005; 93: 387-391
        • Bossuyt P.M.
        • Reitsma J.B.
        • Bruns D.E.
        • Gatsonis C.A.
        • Glasziou P.P.
        • Irwig L.
        • et al.
        STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.
        BMJ. 2015; 351: h5527