Abstract
Objective
Comparisons of the performance of multiple health care providers are often based on
hypothesis tests, those with resulting P-values below some critical threshold being identified as potentially extreme. Because
of the multiple testing involved, the classical P-value threshold of, say, 0.05 may not be considered strict enough, as it will tend
to lead to too many “false positives.” However, we argue that the commonly used Bonferroni-corrected
threshold is in general too strict for the problem in hand. The purpose of this article
is to demonstrate a suitable alternative thresholding procedure that is already well
established in other fields.
Study Design and Setting
The suggested procedure involves control of an error measure called the “false discovery
rate” (FDR). We present a worked example involving a comparison of risk-adjusted mortality
rates following heart surgery in New York State hospitals during 2000–2002. It is
shown that the FDR critical threshold lines can be drawn on a “funnel plot,” providing
a simple graphical presentation of the results.
Results
The FDR procedure identified more providers as potentially extreme than the Bonferroni
correction, while maintaining control of an intuitively sensible error measure.
Conclusion
Control of the FDR offers a simple guideline to determining where to draw critical
thresholds when comparing multiple health care providers.
Keywords
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Journal of Clinical EpidemiologyAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- A Colombian survey found intensive care mortality ratios were better in private vs public hospitals.J Clin Epidemiol. 2006; 59: 94-101
- Statistical methods for profiling providers of medical care: issues and applications.J Am Stat Assoc. 1997; 92: 803-814
- From a process of care to a measure: the development and testing of a quality indicator.Int J Qual Health Care. 2001; 13: 489-496
- Reporting comparative results from hospital patient surveys.Int J Qual Health Care. 1999; ll: 251-259
- Adjusting for multiple testing—when and how?.J Clin Epidemiol. 2001; 54: 343-349
- Controlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc B. 1995; 57: 289-300
- The positive false discovery rate: a Bayesian interpretation and the q-value.Ann Stat. 2003; 6: 2013-2035
- Statistical significance for genomewide studies.Proc Natl Acad Sci USA. 2003; 100: 9440-9445
- Thresholding of statistical maps in functional neuroimaging using the false discovery rate.NeuroImage. 2002; 15: 870-878
- Use of the false discovery rate for evaluating clinical safety data.Stat Methods Med Res. 2004; 13: 227-238
- Statistical issues in the prospective monitoring of health outcomes across multiple units.J R Stat Soc A. 2004; 167: 541-559
- Adult Cardiac Surgery in New York State, 2000–2002.New York State Department of Health, Albany: New York2004 (Available at) (Accessed July 2006)
- Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates.Br Med J. 1998; 316: 1701-1704
- Performance ratings 2005.(Available at) (Accessed July 2006)
- Funnel plots for comparing institutional performance.Stat Med. 2005; 24: 1185-1202
- Multiple significance tests: the Bonferroni method.Br Med J. 1995; 310: 170
- Surgical audit: statistical lessons from Nightingale and Cod-man.J Royal Statist Soc A. 1999; 162: 45-58
- Medical statistics: A commonsense approach.3rd ed. Wiley, Chichester1999
- Principles of medical statistics.Chapman & Hall, Boca Raton, FL2002
- Comparing hospital mortality in adult patients with pneumonia.Ann Intern Med. 1995; 122: 125-132
- Gatsonis C. Kass R.E. Carlin B. Carriquiry A. Gelman A. Verdinelli I. Case studies in Bayesian statistics. Analysis of hospital quality monitors using hierarchical time series models. Vol. 4. Springer-Verlag, New York1998: 287-302
- . The control of the false discovery rate in multiple testing under dependency.Ann Stat. 2001; 29: 1165-1188
- Medical profiling: improving standards and risk adjustment using hierarchical models.J Health Econ. 2000; 19: 291-309
- Handling over-dispersion of performance indicators.Qual Saf Health Care. 2005; 14: 347-351
Ohlssen DI, Sharples LD, Spiegelhalter DJ. A hierarchical modelling framework for identifying unusual performance in health care providers. J R Stat Soc A, In press.
- A simple diagnostic plot connecting robust estimation, outliers and false discovery rates.J Appl Stat. 2006; 33: 1131-1147
- League tables and their limitations: statistical issues in comparisons of institutional performance.J R Stat Soc A. 1996; 159: 385-443
- Quantitative refinements for comparisons of performance.J R Stat Soc A. 1998; 161: 5-12
- Commissioned analysis of surgical performance using routine data: lessons from the Bristol inquiry—discussion on the paper by Spiegelhalter, Aylin, Best, Evans and Murray.J R Stat Soc A. 2002; 165: 221-231
Article info
Publication history
Published online: October 24, 2007
Accepted:
April 18,
2007
Identification
Copyright
© 2008 Elsevier Inc. Published by Elsevier Inc. All rights reserved.