If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Meibergdreef 9, 1105 AZ, Amsterdam, The Netherlands
Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Meibergdreef 9, 1105 AZ, Amsterdam, The NetherlandsDepartment of Pediatrics, Necker-Enfants Malades Hospital, Assistance Publique-Hôpitaux de Paris, Paris Descartes University, 149, rue de Sevres, 75015 Paris, FranceInserm, Obstetrical, Perinatal and Pediatric Epidemiology Research Team, Center for Epidemiology and Biostatistics (U1153), Paris Descartes University, 53, avenue de l’Observatoire, 75014 Paris, France
Dutch Cochrane Centre, Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, University Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Meibergdreef 9, 1105 AZ, Amsterdam, The Netherlands
Informative journal abstracts are crucial for the identification and initial appraisal of studies. We aimed to evaluate the informativeness of abstracts of diagnostic accuracy studies.
Study Design and Setting
PubMed was searched for reports of studies that had evaluated the diagnostic accuracy of a test against a clinical reference standard, published in 12 high-impact journals in 2012. Two reviewers independently evaluated the information contained in included abstracts using 21 items deemed important based on published guidance for adequate reporting and study quality assessment.
Results
We included 103 abstracts. Crucial information on study population, setting, patient sampling, and blinding as well as confidence intervals around accuracy estimates were reported in <50% of the abstracts. The mean number of reported items per abstract was 10.1 of 21 (standard deviation 2.2). The mean number of reported items was significantly lower for multiple-gate (case–control type) studies, in reports in specialty journals, and for studies with smaller sample sizes and lower abstract word counts. No significant differences were found between studies evaluating different types of tests.
Conclusion
Many abstracts of diagnostic accuracy study reports in high-impact journals are insufficiently informative. Developing guidelines for such abstracts could help the transparency and completeness of reporting.
The informativeness of many abstracts of diagnostic accuracy studies is suboptimal.
What this adds to what was known?
•
Reporting of information related to risk of bias and generalizability of study findings needs to be improved in particular.
•
Reporting guidelines for abstracts of randomized trials and systematic reviews have been developed, but similar guidelines are not available for abstracts of diagnostic accuracy studies.
What is the implication and what should change now?
•
Guidelines could be developed to facilitate writing informative and transparent abstracts of diagnostic accuracy studies.
Evaluating the validity of health research is only possible when study reports are sufficiently informative [
]. In response to increasing evidence of substandard reporting of biomedical studies, collaborative initiatives have led to the development of reporting guidelines in different fields of research, such as the Consolidated Standards of Reporting Trials (CONSORT) statement for randomized controlled trials [
]. Diagnostic accuracy studies evaluate how well a medical test identifies or rules out a target condition, as detected by a clinical reference standard. Study results are typically expressed in measures such as sensitivity and specificity. The STARD statement contains a checklist of 25 items that should be presented in all reports of diagnostic accuracy studies, covering key elements from study design and setting, selection of participants, execution and interpretation of tests, data analysis, and study results.
Unlike some other guidelines, such as those for reporting randomized controlled trials [
], STARD so far has not provided detailed guidance for writing journal abstracts. Readers, especially those in resource constrained settings where free access to full study reports is limited, might base clinical decision making on the information provided in abstracts only. Clinicians, researchers, systematic reviewers, and policy makers need to assess and critically appraise large amounts of information in short periods of time to keep up to date. Abstracts play a crucial role in this process. Initially introduced in the 1960s [
], abstracts have especially gained importance in the past three decades because of the development of evidence-based medicine, the almost exponential increase in medical journals and publications, and the increased access to online libraries such as PubMed. To accommodate these changes, the structured abstract was introduced in 1987, and the great majority of biomedical journals has adopted it since then [
Incomplete, partial or even incorrect information in abstracts makes it difficult for readers to identify research questions, study methods, study results, and the implications of study findings. Despite undisputable improvements [
Quality of nonstructured and structured abstracts of original research articles in the British Medical Journal, the Canadian Medical Association Journal and the Journal of the American Medical Association.
Assessment of adherence to the CONSORT statement for quality of reports on randomized controlled trial abstracts from four high-impact general medical journals.
Effect of editors' implementation of CONSORT guidelines on the reporting of abstracts in high impact medical journals: interrupted time series analysis.
]. Whether similar deficiencies exist in reports of diagnostic accuracy studies is unknown. Two previous studies evaluated the content of abstracts of such studies but only for a small number of items and in specific fields of research [
]. We aimed to systematically evaluate the informativeness of abstracts of diagnostic accuracy studies published in 12 high-impact journals in 2012, by scoring whether essential methodological features and study results were reported.
2. Materials and methods
2.1 Literature search and selection of studies
We searched PubMed using a search filter with high sensitivity for diagnostic accuracy studies (“sensitivity AND specificity”[MH] OR specificit*[TW] OR “false negative”[TW] OR accuracy[TW]) [
]. We looked for study reports published in one of six general medical journals (Annals of Internal Medicine, Archives of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine) and six discipline-specific journals (Archives of Neurology, Clinical Chemistry, Circulation, Gut, Neurology, and Radiology) in 2012. These 12 journals were selected in line with previous evaluations, in which they were found to publish the largest number of diagnostic accuracy study reports among all journals with an impact factor over 4 [
]. The median impact factor of these journals in 2012 was 12.4 (range 6.3 to 51.7). As of 2012, eight of these journals clearly stated in their instructions to authors that they require adherence to STARD, and three only provided a reference to STARD. The same set of studies has been used previously to evaluate adherence to the STARD reporting guidelines [
Eligible were all articles that reported estimates of the accuracy of medical tests in humans, based on a comparison of index test results against a clinical reference standard. Two reviewers independently examined studies for inclusion; disagreements were solved through discussion. First, all titles and abstracts were screened to identify potentially eligible articles. After this, the full text of potentially eligible articles was evaluated. In line with previous evaluations of STARD [
], only a randomly selected quarter of the potentially eligible articles published in Radiology was evaluated for inclusion because the number of diagnostic accuracy studies published in this journal was relatively large. We prepared a random list of the potentially eligible articles from this journal and, using a random number generator in Excel, selected at least two articles from each month of the year, starting from the top of the list.
For the current evaluation, we secondarily excluded studies if they did not report or mention at least one of these measures of diagnostic accuracy in the abstract: sensitivity, specificity, likelihood ratios, predictive values, diagnostic odds ratio, accuracy, area under the receiver operator curve, or C index.
2.2 Data extraction
We extracted the first author, journal, journal type (general vs. discipline-specific), study design [single-gate (cohort type) studies, which used one set of inclusion criteria, vs. multiple-gate (case–control type) studies, which used multiple sets of inclusion criteria] [
], and type of test under evaluation (imaging tests vs. laboratory tests vs. other types of tests). We also extracted the sample size (number of participants or biological specimens) as reported in the abstract and the word count (number of words used) of each included abstract, excluding the title. Two independent reviewers extracted all data; disagreements were solved through discussion.
2.3 Informativeness of abstracts
A review team developed a list of items to evaluate the content of abstracts, mostly aiming at key elements related to study validity. The review team consisted of four researchers, all of them part of the STARD group (D.A.K., with 2 years of experience, J.F.C., with 4 years of experience, and L.H. and P.M.M.B., each with more than 10 years of experience in performing literature reviews of diagnostic accuracy studies). First, a longlist of 36 potentially relevant items was generated based on the STARD statement [
] (Appendix A at www.jclinepi.com). After this, each item on the longlist was discussed within the review team, and a subset of items deemed most relevant was selected based on general consensus. The list of items was then piloted and refined by all members of the review team based on an evaluation of 10 included abstracts.
The final list contains 21 items (Appendix B at www.jclinepi.com), focusing on study identification, rationale, objectives, methods for recruitment and testing, participant baseline characteristics, missing data, test results and reproducibility, estimates of diagnostic accuracy, and discussion of study findings, implications, and limitations.
Two authors independently evaluated each included abstract and scored each item as reported or not reported. We also established guidance on the interpretation of each item (Appendix B at www.jclinepi.com). Any discrepancies were solved through discussion. If consensus could not be reached, the case was discussed with a third author, who made the final decision.
2.4 Analysis
We reported general characteristics of included studies as frequencies and percentages or as medians together with interquartile ranges. We counted the total number of reported items for each included abstract (range 0 to 21) and then calculated an overall mean together with standard deviation (SD) and range of the number of reported items across studies. For each item on the list, the number and percentage of abstracts reporting the information was calculated. Interreviewer agreement on the scoring of items was assessed by calculating the Kappa statistic, excluding the 10 abstracts that were used to pilot and refine the list of items. We used univariate analysis with one-way ANOVA to compare the mean number of items reported between journal types, study designs, test types, and sample sizes and abstract word counts. For the latter two, we used a median split. We also adjusted a multiple-linear regression model that included variables with a P < 0.10 on univariate analysis, to explore conditional associations with the number of items reported. Statistical analyses were performed using SPSS version 20 (Armonk, NY, USA).
3. Results
The literature search generated 600 records (Fig. 1). Selection based on titles and abstracts resulted in a Kappa of 0.67 [95% confidence interval (CI): 0.62, 0.73]; this was 0.77 (95% CI: 0.68, 0.88) for full-text selection and 0.63 (95% CI: 0.40, 0.86) for the final abstract selection. We included 103 articles reporting on the evaluation of the diagnostic accuracy of a medical test in their abstract. Characteristics of the included studies are provided in Table 1.
The Kappa statistic in scoring items was 0.85 (95% CI: 0.83, 0.87). The mean number of items reported in the abstracts was 10.1 of 21 (SD 2.2; range 6 to 15). All abstracts reported more than five items on the list, 38% of the abstracts reported 11 items or more. No abstract reported more than 15 items (Fig. 2).
Fig. 2Proportion of abstracts of diagnostic accuracy studies that reported at least the indicated number of items on the 21-item list. The dotted line indicates the percentage of abstracts reporting more than 50% of the evaluated items.
3.2 Factors associated with number of items reported
The mean number of reported items was significantly lower in abstracts published in specialty journals (9.6; SD 2.0) compared with general journals [12.2; SD 1.9; mean difference (MD) 2.6 (95% CI: 1.6, 3.6); P < 0.001], in articles reporting on multiple-gate studies (9.0; SD 2.0) compared with single-gate studies [10.6; SD 2.1; MD 1.5 (95% CI: 0.7, 2.4); P = 0.001], in abstracts of studies with sample sizes below the median (9.4; SD 1.9) compared with those above [10.8; SD 2.3; MD 1.4 (95% CI: 0.6, 2.3); P = 0.001], and in abstracts with a word count below the median (9.5; SD 2.3) compared with those above [10.6; SD 2.0; MD 1.1 (95% CI: 0.3, 2.0); P = 0.008] (Fig. 3). The number of items did not significantly differ according to the type of test under evaluation: 10.6 (SD 2.2) for imaging tests, 9.6 (SD 2.2) for laboratory tests, and 10.1 (SD 2.1) for other tests (P = 0.13).
Fig. 3Number of items reported by subgroups (A = Type of journal; B = Study design; C = Type of test; D = Sample size; E = Abstract word count). Each dot represents one study. The bold horizontal lines show the mean number of items reported in each subgroup. P-values are based on parametric testing.
In multiple-linear regression, the type of journal [adjusted mean difference (AMD) 1.9 (95% CI: 0.8, 3.0); P = 0.001], study design [AMD 1.0 (95% CI: 0.1, 1.8); P = 0.03], and sample size [AMD 0.9 (95% CI: 0.1, 1.7); P = 0.02] were significantly associated with the number of items reported, whereas word count was not [AMD 0.5 (95% CI: −0.3, 1.3); P = 0.22].
3.3 Item-specific reporting
The reporting of individual items on the list was highly variable (Table 2). Twelve of the 21 items were reported in less than half of the evaluated abstracts; only five items were reported in more than three-quarters of the abstracts.
Table 2Items reported in the abstracts of diagnostic accuracy studies (N = 103)
Item
N
%
Title
Identify the article as a study of diagnostic accuracy in title
51
50
Background and aims
Rationale for study/background
47
46
Research question/aims/objectives
86
84
Methods
Study population (at least one of following)
46
45
a—inclusion/exclusion criteria
15
15
b—study setting
33
32
c—number of centers
33
32
d—study location
17
17
Recruitment dates
18
18
Patient sampling (consecutive vs. random sample)
11
11
Data collection (prospective vs. retrospective)
52
51
Study design (multiple gate vs. single gate)
97
94
Clinical reference standard
59
57
Information on the index test (at least one of following)
103
100
a—index test
103
100
b—technical specifications and/or commercial name
72
70
c—cutoffs, categories of results of index test
36
35
Blinding (at least one of following)
17
17
a—when interpreting the index test
13
13
b—when interpreting the reference standard
6
6
Results
Study participants (at least one of following)
98
95
a—number of participants
98
95
b—age of participants
11
11
c—gender of participants
24
23
Number of indeterminate results/missing values
6
6
Disease prevalence
74
72
Data to construct 2 × 2 table
22
21
Estimates of diagnostic accuracy (at least one of following)
96
93
a—sensitivity and/or specificity
67
65
b—negative and/or positive predictive value
20
19
c—negative and/or positive likelihood ratio
2
2
d—area under the receiver operating characteristic curve/C statistic
36
35
e—diagnostic odds ratio
0
0
f—accuracy
13
13
Confidence intervals around estimates of diagnostic accuracy
Fifty percent of the abstracts announced the evaluation of a diagnostic test in the title, and 46% provided a rationale for this evaluation in the abstract's introduction. Research objectives, aims, or questions were lacking in 15% of the abstracts.
3.3.2 Methods
There was large variability in reporting for various aspects of the study methods. Key items that should inform the reader about which participants were eligible and how, where, and when they were recruited were rarely reported: the inclusion criteria (15%), study setting (32%), number of centers (32%), study location (17%), recruitment dates (18%), and patient sampling (11%) were all reported in less than one-third of the abstracts.
Reporting of elements related to the design of the study was better: 51% of the abstracts reported whether data were collected prospectively or retrospectively, and it was clear in 94% of the abstracts whether the article reported on a single-gate or a multiple-gate study.
The reference standard was described in 57% of the abstracts, but all reported the index test. Information on the index test often included some technical specifications (70%), but rarely included details on cutoffs and categories for test positivity (35%), and information on whether readers were blinded to the results of the reference standard or other clinical data (13%).
3.3.3 Results
All but five abstracts (95%) reported the number of participants included, but more specific information regarding demographic characteristics of participants, such as age and gender, was seldom provided (11% and 23%, respectively). Information on disease prevalence was reported by 72% of the abstracts, but the number of indeterminate or missing test results (6%), data to construct 2 × 2 tables (21%), and results on the reproducibility of the index test (e.g., by means of kappa values; 17%) was rarely reported. Estimates of diagnostic accuracy, most often sensitivity and specificity, were available in 93% of the abstracts, but only 26% provided CIs.
3.3.4 Discussion
All but five abstracts (95%) discussed the diagnostic accuracy of the index test under evaluation, but clear implications for future research (9%) and study limitations (3%) were rare.
4. Discussion
We systematically evaluated the informativeness of abstracts of diagnostic accuracy studies published in 12 high-impact journals in 2012 and observed important weaknesses in the information provided. Key features of study design and a useful description of study results are often lacking, making proper identification and initial critical appraisal of studies difficult, if not impossible.
We only evaluated studies published in high-impact journals. This selection may have produced an overestimate of the number of items typically reported, as it is conceivable that the quality of diagnostic accuracy abstracts is poorer in low-impact journals. Evaluations of full-text articles in other fields of health research have shown poorer reporting quality in low-impact journals [
], although this does not necessarily apply to abstracts. A minority of studies in our sample reported multiple results, not just diagnostic accuracy, in which case the abstract had to include information about these other study aims as well, within the journal's word limits. In the absence of proper prospective registration, it is difficult to identify the primary aims of these studies [
]. This was an exploratory analysis; the sample size was not calculated to detect differences between subgroups, and the results of subgroup analyses should be interpreted with caution.
Only a few previous studies have evaluated abstracts of diagnostic accuracy studies and only for specific tests or disciplines. Estrada et al. [
] examined 33 abstracts of studies evaluating diagnostic tests for trichomoniasis published between 1976 and 1998, with regard to patient selection and spectrum, verification of index test results, and blinding. None of the abstracts reported more than two of these four methodological criteria. Brazzelli et al. [
] examined determinants of later full publication of 160 abstracts of diagnostic accuracy studies presented at two international stroke conferences between 1995 and 2004. Although not their primary objective, they found that 65% did not report on type of data collection (prospective vs. retrospective), 76% did not report on blinding of test results, and 89% did not state whether interobserver agreement had been assessed, whereas only one study did not report the sample size. This is very similar to our results.
Our analyses focused on whether items were reported in the abstract and not whether the abstract was an honest and balanced presentation of the study and its findings. Another review from our group demonstrated that about one in four abstracts of diagnostic accuracy studies are overoptimistic, with stronger conclusions in the abstract than in the full text, selective reporting of results and discrepancies between the study aims and abstract conclusions, phenomena often referred to as “spin” [
] evaluated 108 diagnostic accuracy studies on molecular research and graded all statements referring to the investigated test's clinical applicability, basing the final weight of this grading on the abstract. Almost all articles (96%) made statements that were definitely favorable or promising and 56% overinterpreted the clinical applicability of their findings. Boutron et al. [
] showed that overoptimistic abstracts are also highly prevalent in reports of randomized trials.
Our list of items was developed to evaluate the informativeness of abstracts; it should not be considered as a proposal for a reporting guideline. We acknowledge that it may not be possible to report all 21 items within the word limits of a journal abstract. Guidelines for reporting of abstracts of randomized trials and systematic reviews have proposed 17 and 12 items, respectively [
We also acknowledge that some of the 21 items may be more important than others. Providing essential items of study design is crucial for abstracts of diagnostic accuracy studies because diagnostic accuracy is not a fixed test property but reflects the behavior of a test in a particular clinical context and setting. Diagnostic accuracy studies are also prone to multiple sources of bias, and the abstract can inform the reader whether these biases were avoided. If not, the reader may want to skip the article and look further for information on the test's accuracy.
Inclusion criteria, study setting, and participant sampling, insufficiently reported in most abstracts we evaluated, are essential to the reader because disease severity and patient spectrum are well-established sources of variation of diagnostic accuracy [
]. Disease prevalence, one of the most often reported items in our evaluation but still lacking in more than a quarter of abstracts, is a major determinant of the applicability of study findings to another clinical situation because, contrarily to what clinicians usually think, diagnostic accuracy varies with disease prevalence [
]. Knowledge of the reference standard, not reported or unclear in almost half of the evaluated abstracts, is also crucial because the use of an inappropriate reference standard may lead to biased conclusions. Not providing CIs around estimates of accuracy, as three-quarters of the evaluated abstracts did, could seriously mislead readers as the uncertainty of the estimates cannot be judged.
Poor reporting represents a waste of time and research resources [
]. Future scientific efforts could include the development of guidelines to facilitate writing sufficiently informative and transparent abstracts of diagnostic accuracy studies, as has been done for randomized trials and for systematic reviews [
Has the quality of abstracts for randomised controlled trials improved since the release of Consolidated Standards of Reporting Trial guideline for abstract reporting? A survey of four high-profile anaesthesia journals.
Effect of editors' implementation of CONSORT guidelines on the reporting of abstracts in high impact medical journals: interrupted time series analysis.
Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals.
]. We believe initiatives to improve reporting quality must be multistaged and multitarget. Increasing awareness about the need for informative, complete, and balanced reporting is one such element, and this applies to study authors, reviewers, editors, and readers. Titles and abstracts are not promotional material but form an essential part of honest reporting, facilitating the timely identification and initial appraisal of studies for those in need of evidence to guide clinical decisions.
Acknowledgments
The authors thank W. Annefloor van Enst, PhD for her contributions to the literature selection as part of the full-text evaluation of adherence to STARD.
Quality of nonstructured and structured abstracts of original research articles in the British Medical Journal, the Canadian Medical Association Journal and the Journal of the American Medical Association.
Assessment of adherence to the CONSORT statement for quality of reports on randomized controlled trial abstracts from four high-impact general medical journals.
Effect of editors' implementation of CONSORT guidelines on the reporting of abstracts in high impact medical journals: interrupted time series analysis.
Has the quality of abstracts for randomised controlled trials improved since the release of Consolidated Standards of Reporting Trial guideline for abstract reporting? A survey of four high-profile anaesthesia journals.
Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals.
Funding: J.F.C. is supported by an educational grant from Assistance Publique Hôpitaux de Paris (Année Médaille de l'Internat 2013). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the article.