Abstract
Objectives
Study Design and Setting
Results
Conclusion
Keywords
1. Introduction
- •When defining the key question(s) for assessing the quality of evidence, a clear distinction is needed between test accuracy and patient-important outcome(s) as the choice outcome. Grading of Recommendations Assessment, Development and Evaluation (GRADE) criteria such as “inconsistency,” “imprecision,” and “publication bias” were challenging to interpret and apply as was the application of the criteria to comparative test accuracy evidence.
Key findings
- •The current publications on the GRADE for diagnostics approach present an explanation of the approach. In contrast, this article describes the “practical” application of the approach when used to rate a body of evidence such as diagnostic tests accuracy review. It outlines a number of real-life challenges and considerations a user of this approach may encounter and provides suggestions on how these can be addressed.
What this adds to what was known?
- •Explicit guidance and worked examples illustrating the application of the GRADE criteria of inconsistency, imprecision, and publication bias would facilitate the use of the methodology when rating diagnostic test accuracy evidence. Guidance on the translation of a Quality Assessment of Diagnostic Accuracy Studies (QUADAS) 2 risk of bias and applicability assessment to the corresponding GRADE criteria of risk of bias and indirectness would help users in the use of the GRADE approach.
What is the implication and what should change now?

2. Methods
Cochrane diagnostic test accuracy review | Description of review | Number of assessors |
---|---|---|
Optical coherence tomography (OCT) for detection of macular edema in patients with diabetic retinopathy | This review assessed the diagnostic accuracy of OCT for the detection of diabetic macular edema and/or its more severe form of clinically significant macular edema. The review included nine cohort studies. Unit of analyses in the included studies was the individual eye and not the patient. | 3 (R.A.M., M.W.L., and M.M.G.L.) |
Physical examination for lumbar radiculopathy due to disc herniation in patients with low back pain. | This review assessed tests performed during physical examination (alone or in combination) to identify radiculopathy due to lower lumbar disc herniation as established during imaging or surgery in patients with low back pain and sciatica. The review included 19 studies (16 cohort studies and 3 case–control studies), of which 1 study was conducted in a primary care setting. A variety of physical examination tests were used in the studies with the straight leg raising test or Lasègue test being the most frequent (15 studies). The included studies used different reference standards, with surgical findings or imaging (CT or MRI) being the most frequent ones (nine and six studies, respectively). | 6 (G.G., R.A.M., M.W.L., C.D., C.H., and R.J.P.M.S.) |
Rapid diagnostic tests (RDTs) for diagnosing uncomplicated Plasmodium falciparum malaria in endemic countries. | This review assessed the diagnostic accuracy of immunochromatography-based RDTs for detecting clinical P. falciparum malaria (symptoms suggestive of malaria plus P. falciparum parasitaemia detectable by microscopy) in persons living in malaria endemic areas who present to ambulatory health care facilities with symptoms of malaria and to identify which types and brands of commercial test best detect clinical P. falciparum malaria. The authors included 111 test evaluations from a total of 74 studies, of which 104 test evaluations were in comparison with microscopy, 2 test evaluations were in comparison with PCR-adjusted microscopy, and 5 studies compared RDTs with PCR only. All studies were consecutive patient series. | 6 (G.G., R.A.M., J.B., M.W.L., M.M.G.L., and C.D.) |
3. Results
Key issues identified | Observations |
---|---|
Key question formulation | Key question formulation was not an explicit step; guidance on how these could be defined was also not explicit Assessors whose key questions focused on outcomes that were patient important made different judgments on evidence quality compared with assessors whose key questions focused on test accuracy as the outcome |
GRADE domains | |
Risk of bias (RoB) | Assessors were unclear on how to judge QUADAS items labeled “unclear” |
Indirectness | (1) Issues on applicability of findings to patient population of interest (2) Test accuracy is inherently indirect evidence for patient outcomes, resulting in default downgrading of the quality |
Inconsistency | Assessors used different rationales for downgrading (eg, confidence interval overlap, unexplained heterogeneity, inconsistent use of test threshold positivity, and variable reference standard definitions) |
Imprecision | Assessors used different rationales for downgrading (eg, small study numbers, wide confidence intervals) |
Publication bias | Assessors were unclear on how to assess this |
Across all GRADE domains | Reviewers had to be conscious to not double downgrade on a single factor |
Additional points for comparative test review | (1) For an indirect comparison of two index tests, the quality of the assessment of test accuracy for each test needed to be assessed first and then the quality of the comparison |
(2) When making the relative comparison, the score for each GRADE domain (eg, RoB, indirectness, etc.) was determined as the lower of the two scores for that domain for each index test compared with its reference standard | |
(3) The overall quality of evidence (for an indirect comparison of two index tests) was further downgraded by one level for indirectness |
Test result | Study design | Factors that may decrease quality of evidence | Test property (95% CI) | Test result | Number per 1,000 tested for given prevalence of target condition1 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Risk of bias | Indirectness | Inconsistency | Imprecision | Publication bias | 58% | 82% | 98% | ||||
Sensitivity (TP + FN) | Eight historical cohort + one case–control | Serious2 | Serious3 | Unclear on how to assess | No | Undetected | 0.92 (0.87, 0.95) | TPs | 534 (505, 551) | 751 (713, 779) | 902 (853, 931) |
FNs | 46 (29, 75) | 66 (41, 107) | 78 (49, 127) | ||||||||
Specificity (FP + TN) | Eight historical cohort + one case–control | Serious2 | Serious3 | Unclear on how to assess | Serious4 | Undetected | 0.28 (0.18, 0.40) | FPs | 284 (252, 344) | 130 (108, 148) | 14 (12, 16) |
TNs | 118 (76, 168) | 50 (32, 72) | 6 (4, 8) |
- 1Prevalence was based on range in included studies.
- 2Many items are unclear on the QUADAS assessment.
- 3Studies not done in primary care setting. Very high prevalence of condition in included studies.
- 4Very wide CI for specificity.
3.1 Question formulation
3.2 Issues in applying GRADE criteria in a single test review
3.2.1 Risk of bias
3.2.2 Indirectness
3.2.3 Inconsistency
3.2.4 Imprecision
3.2.5 Publication bias
3.2.6 General comments
3.3 Issues in applying the GRADE criteria in a comparative test review
4. Discussion
4.1 Summary of results
4.2 Areas for further development in the GRADE for diagnostics approach
4.3 Considerations for authors of primary studies and systematic reviews of diagnostic test accuracy
4.4 Other considerations
4.5 Limitations of this study
5. Conclusions
Appendix. Supplementary data
- Supplementary Table 1
References
- GRADE guidelines: 15. Going from evidence to recommendation—determinants of a recommendation's direction and strength.J Clin Epidemiol. 2013; 66: 726-735
- What is “quality of evidence” and why is it important to clinicians?.BMJ. 2008; 336: 995-998
- GRADE: an emerging consensus on rating quality of evidence and strength of recommendations.BMJ. 2008; 336: 924-926
- GRADE guidelines: 3. Rating the quality of evidence.J Clin Epidemiol. 2011; 64: 401-406
- GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes.J Clin Epidemiol. 2013; 66: 151-157
- Introducing GRADE across the NICE clinical guideline program.J Clin Epidemiol. 2013; 66: 124-131
- Developing and evaluating communication strategies to support informed decisions and practice based on evidence (DECIDE): protocol and preliminary results.Implement Sci. 2013; 8: 6
- Policy statement: automated real-time nucleic acid amplification technology for rapid and simultaneous detection of tuberculosis and rifampicin resistance: Xpert MTB/RIF system.World Health Organisation, Geneva, Switzerland2011
- Application of GRADE: making evidence-based recommendations about diagnostic tests in clinical practice guidelines.Implement Sci. 2011; 6: 62
- Introduction to the ninth edition: Antithrombotic Therapy and Prevention of Thrombosis, 9th ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines.Chest. 2012; 141: 48S-52S
- Diagnosis and Rationale for Action Against Cow's Milk Allergy (DRACMA): a summary report.J Allergy Clin Immunol. 2010; 126: 1119-1128
- An official American Thoracic Society/Society of Thoracic Radiology clinical practice guideline: evaluation of suspected pulmonary embolism in pregnancy.Am J Respir Crit Care Med. 2011; 184: 1200-1208
- Diagnosis of DVT: Antithrombotic Therapy and Prevention of Thrombosis, 9th ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines.Chest. 2012; 141: e351S-e418S
- Grading quality of evidence and strength of recommendations in clinical practice guidelines: part 2 of 3. The GRADE approach to grading quality of evidence about diagnostic tests and strategies.Allergy. 2009; 64: 1109-1116
- Grading quality of evidence and strength of recommendations for diagnostic tests and strategies.BMJ. 2008; 336: 1106-1110
- Systematic reviews of diagnostic test accuracy.Ann Intern Med. 2008; 149: 889-897
- QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.Ann Intern Med. 2011; 155: 529-536
- Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative.BMJ. 2003; 326: 41-44
- Assessing the value of diagnostic tests: a framework for designing and evaluating trials.BMJ. 2012; 344: e686
- Optical coherence tomography versus stereoscopic fundus photography or biomicroscopy for diagnosing diabetic macular edema: a systematic review.Invest Ophthalmol Vis Sci. 2007; 48: 4963-4973
- Physical examination for lumbar radiculopathy due to disc herniation in patients with low-back pain.Cochrane Database Syst Rev. 2010; : CD007431
- Rapid diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic countries.Cochrane Database Syst Rev. 2011; : CD008122
- The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews.BMC Med Res Methodol. 2003; 3: 25
- The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed.J Clin Epidemiol. 2005; 58: 882-893
- Evaluation of PICO as a knowledge representation for clinical questions.AMIA Annu Symp Proc. 2006; : 359-363
- GRADE guidelines: 6. Rating the quality of evidence—imprecision.J Clin Epidemiol. 2011; 64: 1283-1293
- GRADE guidelines: 7. Rating the quality of evidence—inconsistency.J Clin Epidemiol. 2011; 64: 1294-1302
- GRADE guidelines: 5. Rating the quality of evidence—publication bias.J Clin Epidemiol. 2011; 64: 1277-1282
- GRADE guidelines: 8. Rating the quality of evidence—indirectness.J Clin Epidemiol. 2011; 64: 1303-1310
- GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias).J Clin Epidemiol. 2011; 64: 407-415
Macaskill P, Gatsonis C, Deeks JJ, Harbord RM, Takwoingi Y. Analysing and Presenting Results. Deeks JJ, Bossuyt PM, Gatsonis C. Cochrane handbook for systematic reviews of diagnostic test accuracy. Version 1.0(10). The Cochrane Collaboration; Birmingham, UK 2010.
- Clinical trial registration—looking back and moving ahead.N Engl J Med. 2007; 356: 2734-2736
- GRADE guidelines: 2. Framing the question and deciding on important outcomes.J Clin Epidemiol. 2011; 64: 395-400
- Uptake of newer methodological developments and the deployment of meta-analysis in diagnostic test research: a systematic review.BMC Med Res Methodol. 2011; 11: 27
- Going from evidence to recommendations.BMJ. 2008; 336: 1049-1051
- Comparative accuracy: assessing new tests against existing diagnostic pathways.BMJ. 2006; 332: 1089-1092
Article info
Publication history
Footnotes
☆This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
This work has been fully funded by the DECIDE Project which is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement number 258583.
Competing interests: The authors declare that they have no competing interests.
Identification
Copyright
User license
Creative Commons Attribution - NonCommercial - NoDerivs |
Permitted
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
Not Permitted
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works (except for the author)
- Sell or re-use for commercial purposes
Elsevier's open access license policy