Abstract
Objectives
Study Design and Setting
Results
Conclusion
Keywords
- •In a sample of 163,129 randomized controlled trial publications we found that a more recent publication year, trial registration, mentioning of CONsolidated Standards Of Reporting Trials-checklist, and a higher journal impact factor were consistently associated with a lower risk of questionable research practices.
Key findings
- •We validated previously identified associations between indicators and questionable research practices and explored new indicators.
What this adds to what was known?
- •Our results might inform future strategies to identify those randomized controlled trials at high risk of questionable research practices.
What is the implication?
- •Editors, peer reviewers, and readers should be aware that certain characteristics of the author team, the journal, and the publication might be associated with questionable research practices.
What should change now?
1. Introduction
Code of conduct for responsible research.
2. Methods
- Damen J.
- Lamberink H.J.
- Tijdink J.K.
- Otte W.M.
- Vinkers C.
- Hooft L.
- et al.
2.1 Identification of RCTs
2.2 Data collection of QRPs
- 1.Risk of bias, the probability of bias as determined using Robot Reviewer [[21],[27]] for the domains random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment [21,22,23].
- 2.Modifications in primary outcome measures based on comparing first and final versions of the public trial registration records from ClinicalTrials.gov [[24]].
- 3.The ratio of achieved sample size compared to what was planned.
- 4.Statistical discrepancy, for which we compared the reported P value and actual P value of the intervention effect estimate calculated from other reported information such as the confidence interval.
Risk of bias |
Risk of Bias domains was extracted via open source software provided by Robot Reviewer [ [21] ]. Robot Reviewer is developed to score bias for four domains of the Cochrane Risk of Bias tool: random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment [[22] ]. Robot Reviewer assesses the probability that a study has bias rather than dichotomizing it into high or low risk of bias. Level of agreement between Robot Reviewer and human raters was similar for most domains (average human–human agreement 79% [range 71% to 85%], human–Robot Reviewer agreement 65% [range 39% to 91%]) [[21] ,[23] ]. |
Modifications in primary outcomes |
Changes made to the primary outcome after the trial had started, as reported in the trial registration on Cinicaltrials.gov. Changes were first automatically extracted by comparing the first and final version of the primary outcome as registered in the study protocols on clinicaltrials.gov. Additions and deletions of complete outcome measures were extracted. The algorithm was too sensitive for changes in the content: if any textual change was present, the primary outcome was flagged as ‘changed’. These flagged studies were subsequently manually checked to distinguish between significant and insignificant (e.g., typo's) changes [ [24] ]. |
Ratio of achieved sample size compared to what was planned |
We calculated the ratio of actual sample size and planned sample size based on the power calculation provided in the public trial registration records from ClinicalTrials.gov. This information could be extracted directly from the trial registration record. A manual check was performed for all publications and protocols where the ratio of the number of enrolled and estimated participants was > 100, that is, 100 times more enrolled than was estimated. |
Statistical discrepancy |
Comparison of reported P value and actual P value of the intervention effect estimate calculated from other reported information. Based on the reported relative risk, odds ratio or hazard ratio in combination with its 95% confidence intervals, the P value was recomputed. This value was compared with the reported P value using a script by Georgescu and Wren [ [25] ]. For t-tests, Chi-square values, F-values z-statistics, and correlations, the R-package StatCheck [[26] ] was used to check the correct reporting of the P value. Inconsistent P values (defined as a difference ≥ 0.01) were marked. Every inconsistency where the adjusted P value crosses the level of 0.05 compared to the original P value was labeled as statistical discrepancy.
statcheck: extract statistics from articles and recompute p values. R Package version 01 0. https://cran.r-project.org/web/packages/statcheck/index.html Date: 2016 Date accessed: December 23, 2022 |
2.3 Data collection of indicators
Author team |
|
Trial/publication |
Journal |
2.4 Statistical analyses
- Zeileis A.
- Cribari-Neto F.
- Gruen B.
- Kosmidis I.
- Simas A.B.
- Rocha A.V.
- et al.
3. Results
3.1 Study flow

3.2 Components of questionable research practices
Questionable research practice | Value | Number of references for which this outcome was available |
---|---|---|
Probability of bias (as assessed by Robot Reviewer) | ||
Probability of bias in randomization | 0.43 (0.18–0.59) | 163,129 |
Probability of bias in allocation concealment | 0.59 (0.40–0.71) | 163,129 |
Probability of bias in blinding of patients and personnel | 0.63 (0.40–0.75) | 163,129 |
Probability of bias in blinding of outcome assessment | 0.55 (0.44–0.64) | 163,129 |
Modifications in primary outcome in public registration | 3,615/16,349 (22.1% [95% CI 21.5–22.8]) | 16,349 |
Ratio of achieved compared to planned sample size | 1 (0.98–1.04) | 24,385 |
Statistical discrepancy | 370/21,230 (1.7% [95% CI 1.6–1.9]) | 21,230 |
3.3 Demographic and bibliometric indicators
3.4 Univariable analyses
3.5 Multivariable models with data available from the trial publication
3.5.1 Risk of bias
3.5.2 Modifications in the primary outcome
3.5.3 Ratio of achieved compared to sample size
3.5.4 Statistical discrepancy
3.6 Multivariable models restricted to data available upon submission to a journal (i.e., before trial publication)
Indicator | Probability of bias | Modifications in outcome | Ratio of achieved compared to target sample size | Statistical discrepancy | |||
---|---|---|---|---|---|---|---|
Bias in randomization | Bias in allocation concealment | Bias in blinding of patients and personnel | Bias in blinding of outcome assessment | ||||
Gender of first author: male | −0.014 (−0.044; 0.016) | −0.011 (−0.038; 0.016) | −0.053 (−0.081; −0.024) | −0.008 (−0.025; 0.009) | 0.041 (−0.171; 0.254) | 0.009 (−0.028; 0.045) | 0.001 (−0.589; 0.591) |
Gender of last author: male | −0.023 (−0.055; 0.009) | −0.016 (−0.042; 0.010) | −0.064 (−0.093; −0.035) | −0.020 (−0.037; −0.002) | −0.069 (−0.301; 0.162) | −0.002 (−0.041; 0.037) | 0.156 (−0.439; 0.751) |
Proportion of female authors | −0.144 (−0.200; −0.088) | −0.053 (−0.103; −0.003) | 0.071 (0.019; 0.123) | −0.047 (−0.080; −0.014) | −0.105 (−0.602; 0.391) | −0.017 (−0.090; 0.055) | 0.847 (−0.495; 2.189) |
Number of authors | −0.009 (−0.012; −0.005) | −0.007 (−0.010; −0.004) | 0.002 (−0.002; 0.005) | −0.003 (−0.005; −0.001) | 0.011 (−0.012; 0.033) | 0.001 (−0.002; 0.004) | −0.003 (−0.074; 0.069) |
Continent of last author: Africa | −0.142 (−0.237; −0.048) | −0.122 (−0.202; −0.041) | 0.072 (−0.018; 0.163) | −0.003 (−0.060; 0.053) | −0.269 (−0.938; 0.400) | −0.003 (−0.087; 0.080) | −0.402 (−2.880; 2.076) |
Continent of last author: Asia | −0.018 (−0.049; 0.013) | 0.101 (0.074; 0.127) | 0.013 (−0.017; 0.042) | 0.011 (−0.007; 0.029) | −0.106 (−0.320; 0.107) | 0.015 (−0.015; 0.045) | −0.061 (−0.912; 0.789) |
Continent of last author: Middle and South America | 0.063 (−0.008; 0.135) | 0.056 (−0.004; 0.115) | −0.073 (−0.139; −0.006) | −0.068 (−0.109; −0.026) | 0.089 (−0.350; 0.529) | 0.013 (−0.053; 0.080) | −0.605 (−3.021; 1.812) |
Continent of last author: North America | 0.110 (0.083; 0.136) | 0.104 (0.082; 0.126) | −0.026 (−0.049; −0.002) | 0.003 (−0.012; 0.018) | 0.284 (0.141; 0.427) | −0.013 (−0.038; 0.012) | −0.071 (−0.589; 0.448) |
Continent of last author: Oceania | −0.217 (−0.275; −0.159) | −0.225 (−0.273; −0.177) | 0.008 (−0.041; 0.057) | −0.123 (−0.154; −0.092) | 0.371 (0.092; 0.651) | −0.032 (−0.082; 0.019) | 0.054 (−0.871; 0.979) |
Number of countries | 0.020 (0.011; 0.029) | 0.003 (−0.005; 0.011) | −0.044 (−0.052; −0.035) | 0.001 (−0.005; 0.006) | 0.030 (−0.018; 0.079) | 0.012 (0.005; 0.019) | 0.064 (−0.111; 0.240) |
H−index of first author | −0.001 (−0.001; −0.000) | −0.001 (−0.001; −0.000) | −0.003 (−0.004; −0.003) | −0.000 (−0.001; −0.000) | 0.005 (0.001; 0.009) | 0.001 (−0.000; 0.001) | −0.002 (−0.016; 0.012) |
H-index of last author | 0.001 (0.000; 0.001) | 0.000 (−0.000; 0.001) | −0.001 (−0.001; −0.000) | −0.001 (−0.001; −0.001) | 0.004 (0.001; 0.007) | −0.000 (−0.001; 0.000) | −0.001 (−0.013; 0.012) |
Academic age of last author: sqrt | 0.002 (−0.007; 0.010) | −0.000 (−0.007; 0.007) | 0.018 (0.010; 0.025) | 0.009 (0.004; 0.014) | −0.039 (−0.097; 0.018) | 0.000 (−0.009; 0.010) | −0.017 (−0.206; 0.173) |
Number of institutions: sqrt | −0.060 (−0.081; −0.039) | −0.066 (−0.085; −0.048) | 0.001 (−0.019; 0.021) | −0.018 (−0.030; −0.005) | 0.171 (0.015; 0.328) | −0.019 (−0.041; 0.002) | −0.196 (−0.644; 0.251) |
Percentage of positive words in abstract | 0.036 (−0.019; 0.090) | 0.043 (−0.004; 0.089) | 0.063 (0.013; 0.112) | 0.017 (−0.015; 0.048) | −0.125 (−0.533; 0.284) | −0.015 (−0.075; 0.045) | 0.355 (−0.400; 1.109) |
Percentage of negative words in abstract | 0.043 (−0.045; 0.131) | 0.008 (−0.067; 0.084) | 0.005 (−0.075; 0.086) | 0.047 (−0.004; 0.098) | −0.209 (−0.759; 0.341) | −0.023 (−0.121; 0.076) | 0.595 (−0.479; 1.668) |
Medical discipline | Appendix 2 | Appendix 2 | Appendix 2 | Appendix 2 | Appendix 2 | Appendix 2 | Appendix 2 |
Mentioning of CONSORT | −0.394 (−0.427; −0.361) | −0.318 (−0.346; −0.290) | 0.029 (−0.001; 0.058) | −0.114 (−0.133; −0.096) | 0.036 (−0.145; 0.217) | −0.019 (−0.045; 0.007) | −0.237 (−0.893; 0.419) |
Trial registration | −0.437 (−0.462; −0.412) | −0.417 (−0.438; −0.395) | −0.155 (−0.178; −0.133) | −0.193 (−0.207; −0.178) | Not applicable | Not applicable | −0.685 (−1.260; −0.110) |
3.7 Multivariable models with data available upon trial registration
3.8 Explained variance
4. Discussion
4.1 Comparison to previous literature
4.2 Recommendations for future research
4.3 Strengths and limitations
5. Conclusion
Supplementary data
- Appendix 1
- Appendix 2
References
- Fostering responsible research practices is a shared responsibility of multiple stakeholders.J Clin Epidemiol. 2018; 96: 143-146
- Reproducibility in science: improving the standard for basic and preclinical research.Circ Res. 2015; 116: 116-126
- Proceedings of the thirteenth conference on public opinion research.Public Opin Q. 1958; 22: 169-216
- Evidence on questionable research practices: the good, the bad, and the ugly.J Bus Psychol. 2016; 31: 323-338
- Degrees of freedom in planning, running, analyzing, and reporting psychological studies: a checklist to avoid p-hacking.Front Psychol. 2016; 7: 1832
- Ranking major and minor research misbehaviors: results from a survey among participants of four World Conferences on Research Integrity.Res Integr Peer Rev. 2016; 1: 17
- The European code of conduct for research integrity;.2017: 161 (Available at)https://allea.org/code-of-conduct/Date accessed: December 23, 2022
- Fostering Integrity in Research. The National Academies Press, Washington, DC2017 (Available at)
- Final rule for clinical trials registration and results information submission (42 CFR Part 11).(Available at)https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submissionDate accessed: November 11, 2021
- The relationship between endorsing reporting guidelines or trial registration and the impact factor or total citations in surgical journals.PeerJ. 2022; 10e12837
- Meta-assessment of bias in science.Proc Natl Acad Sci U S A. 2017; 14: 3714-3719
- Evaluating the surgery literature: can standardizing peer-review today predict manuscript impact tomorrow?.Ann Surg. 2009; 250: 152-158
- Factors associated with scientific misconduct and questionable research practices in Health professions education.bioRxiv. 2018; 2: 74-82
- The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement.PLoS Biol. 2021; 19e3001162
- Boutron I.,et al. Evolution of poor reporting and inadequate methods over time in 20 920 randomised controlled trials included in Cochrane reviews: research on research study.BMJ. 2017; 357: j2490
- Code of conduct for responsible research.(Available at.)https://www.who.int/docs/default-source/wpro–-documents/regional-committee/nomination-regional-director/code-of-conduct/ccrr.pdf?sfvrsn=b2cb450_2&ua=1Date: 2017Date accessed: November 13, 2021
- Predicting questionable research practices in randomized clinical trials. Open Science Framework 2018.(Available at)https://osf.io/27f53/Date accessed: November 13, 2021
- R: A language and environment for statistical computing [program]. R Foundation for Statistical Computing, Vienna, Austria2016
- Improving the quality of reporting of randomized controlled trials. The CONSORT statement.JAMA. 1996; 276: 637-639
- A comparative evaluation of gender detection methods. Proceedings of the 25th International Conference Companion on World Wide Web.International World Wide Web Conferences Steering Committee, Switzerland2016
- RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.J Am Med Inform Assoc. 2016; 23: 193-201
- The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials.BMJ. 2011; 343: d5928
- Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool.J Clin Epidemiol. 2018; 96: 54-62
- Clinical trial registration patterns and changes in primary outcomes of randomized clinical trials from 2002 to 2017.JAMA Intern Med. 2022; 182: 779-782
- Algorithmic identification of discrepancies between published ratios and their reported confidence intervals and P-values.Bioinformatics. 2018; 34: 1758-1766
- statcheck: extract statistics from articles and recompute p values. R Package version 01 0.(Available at)https://cran.r-project.org/web/packages/statcheck/index.htmlDate: 2016Date accessed: December 23, 2022
- Automating biomedical evidence synthesis: RobotReviewer. Proc Conf Assoc Comput Linguistics Meet; 2017. 2017: 7-12
- Gender-heterogeneous working groups produce higher quality science.PLoS One. 2013; 8e79147
- Adequate statistical power in clinical trials is associated with the combination of a male first author and a female last author.Elife. 2018; 7e34412
- Factors associated with converting scientific abstracts to published manuscripts.J Craniofac Surg. 2013; 24: 66-70
- Determinants of selective reporting: a taxonomy based on content analysis of a random selection of the literature.PLoS One. 2018; 13e0188247
- Research misconduct: a report from a developing Country.Iranian J Public Health. 2017; 46: 1374
- Clinical trial design and dissemination: comprehensive analysis of clinicaltrials.gov and PubMed data since 2005.BMJ. 2018; 361: k2130
- Use of positive and negative words in scientific PubMed abstracts between 1974 and 2014: retrospective analysis.BMJ. 2015; 351: h6467
- Are study and journal characteristics reliable indicators of “truth” in imaging research?.Radiology. 2018; 287: 215-223
- The journal impact factor as a predictor of trial quality and outcomes: cohort study of hepatobiliary randomized clinical trials.Am J Gastroenterol. 2005; 100: 2431-2435
- The meaning of author order in medical research.J Investig Med. 2007; 55: 174-180
- Package ‘betareg’. R Package; 2016.(Available at)https://cran.r-project.org/web/packages/betareg/index.htmlDate accessed: December 23, 2022
- Package ‘rms’. Vanderbilt Univ.(Available at)
- mice: multivariate imputation by chained equations in R.J Stat Softw. 2011; 45: 1-67
- Prevalence of research misconduct and questionable research practices: a systematic review and meta-analysis.Sci Eng Ethics. 2021; 27: 41
- Measuring prevalence questionable Res practices incentives truth telling. 2012; 23: 524-532
- The prevalence of statistical reporting errors in psychology (1985–2013).Behav Res Methods. 2016; 48: 1205-1226
- Incongruence between test statistics and P values in medical papers.BMC Med Res Methodol. 2004; 4: 13
- Prevalence of responsible research practices among academics in The Netherlands.F1000Res. 2022; 11: 471
- Validity of the impact factor of journals as a measure of randomized controlled trial quality.J Clin Psychiatry. 2006; 67: 37-40
- A retrospective analysis of randomized controlled trials on traumatic brain injury: evaluation of CONSORT item adherence.Brain Sci. 2021; 11: 1504
- Educating for responsible research practice in biomedical sciences: towards learning Goals.Sci Educ (Dordr). 2022; 31: 977-996
- Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials.Res Synth Methods. 2020; 11: 484-493
- StatReviewer.(Available at)https://www.ariessys.com/views-and-press/newsletter-archive/june-2016/statreviewer-presents-at-emug/Date accessed: November 26, 2019
Article info
Publication history
Footnotes
Author contributions: Johanna A Damen: Conceptualization, Methodology, Formal analysis, Data curation, and Writing–Original Draft. Pauline Heus: Conceptualization, Methodology, and Writing–Review and Editing. Herm J Lamberink: Conceptualization, Methodology, Data curation, and Writing–Review and Editing. Joeri K Tijdink: Conceptualization, Methodology, Writing–Review and Editing, and Funding acquisition. Lex Bouter: Conceptualization, Methodology, and Writing–Review and Editing. Paul Glasziou: Conceptualization and Writing–Review and Editing. David Moher: Conceptualization and Writing–Review and Editing. Willem M Otte: Conceptualization, Methodology, Data curation, and Writing–Review and Editing. Christiaan H Vinkers: Conceptualization, Methodology, Writing–Review and Editing, Supervision, and Funding acquisition. Lotty Hooft: Conceptualization, Methodology, Writing–Review and Editing, and Supervision.
Data availability statement: The risk of bias characterization was done with a large-batch customized Python scripts (version 3; https://github.com/wmotte/robotreviewer_prob). The data management and analyses used R (version 3.6.1). All data are available at https://github.com/wmotte/RCTQuality).
Funding: This work was supported by The Netherlands Organization for Health Research and Development (ZonMw) grant “Fostering Responsible Research Practices” (445001002) (C.V., J.T., and W.O.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Declarations of interest: none.
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy