Review Article| Volume 61, ISSUE 3, P232-240.e2, March 2008

Use of the false discovery rate when comparing multiple health care providers



      Comparisons of the performance of multiple health care providers are often based on hypothesis tests, those with resulting P-values below some critical threshold being identified as potentially extreme. Because of the multiple testing involved, the classical P-value threshold of, say, 0.05 may not be considered strict enough, as it will tend to lead to too many “false positives.” However, we argue that the commonly used Bonferroni-corrected threshold is in general too strict for the problem in hand. The purpose of this article is to demonstrate a suitable alternative thresholding procedure that is already well established in other fields.

      Study Design and Setting

      The suggested procedure involves control of an error measure called the “false discovery rate” (FDR). We present a worked example involving a comparison of risk-adjusted mortality rates following heart surgery in New York State hospitals during 2000–2002. It is shown that the FDR critical threshold lines can be drawn on a “funnel plot,” providing a simple graphical presentation of the results.


      The FDR procedure identified more providers as potentially extreme than the Bonferroni correction, while maintaining control of an intuitively sensible error measure.


      Control of the FDR offers a simple guideline to determining where to draw critical thresholds when comparing multiple health care providers.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Perez A.
        • Dennis R.J.
        • Rondon M.A.
        • Metcalfe M.A.
        • Rowan K.M.
        A Colombian survey found intensive care mortality ratios were better in private vs public hospitals.
        J Clin Epidemiol. 2006; 59: 94-101
        • Normand S.-L.T.
        • Glickman M.E.
        • Gatsonis C.A.
        Statistical methods for profiling providers of medical care: issues and applications.
        J Am Stat Assoc. 1997; 92: 803-814
        • Rubin H.R.
        • Pronovost P.
        • Diette G.B.
        From a process of care to a measure: the development and testing of a quality indicator.
        Int J Qual Health Care. 2001; 13: 489-496
        • Rogers G.
        • Smith D.P.
        Reporting comparative results from hospital patient surveys.
        Int J Qual Health Care. 1999; ll: 251-259
        • Bender R.
        • Lange S.
        Adjusting for multiple testing—when and how?.
        J Clin Epidemiol. 2001; 54: 343-349
        • Benjamini Y.
        • Hochberg Y.
        Controlling the false discovery rate: a practical and powerful approach to multiple testing.
        J R Stat Soc B. 1995; 57: 289-300
        • Storey J.D.
        The positive false discovery rate: a Bayesian interpretation and the q-value.
        Ann Stat. 2003; 6: 2013-2035
        • Storey J.D.
        • Tibshirani R.
        Statistical significance for genomewide studies.
        Proc Natl Acad Sci USA. 2003; 100: 9440-9445
        • Genovese C.R.
        • Lazar N.A.
        • Nichols T.
        Thresholding of statistical maps in functional neuroimaging using the false discovery rate.
        NeuroImage. 2002; 15: 870-878
        • Mehrotra D.V.
        • Heyse J.F.
        Use of the false discovery rate for evaluating clinical safety data.
        Stat Methods Med Res. 2004; 13: 227-238
        • Marshall C.
        • Best N.
        • Bottle A.
        • Aylin P.
        Statistical issues in the prospective monitoring of health outcomes across multiple units.
        J R Stat Soc A. 2004; 167: 541-559
      1. Adult Cardiac Surgery in New York State, 2000–2002.
        New York State Department of Health, Albany: New York2004 (Available at) (Accessed July 2006)
        • Marshall E.C.
        • Spiegelhalter D.J.
        Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates.
        Br Med J. 1998; 316: 1701-1704
        • Healthcare Commission
        Performance ratings 2005.
        (Available at) (Accessed July 2006)
        • Spiegelhalter D.J.
        Funnel plots for comparing institutional performance.
        Stat Med. 2005; 24: 1185-1202
        • Bland J.M.
        • Altman D.G.
        Multiple significance tests: the Bonferroni method.
        Br Med J. 1995; 310: 170
        • Spiegelhalter D.J.
        Surgical audit: statistical lessons from Nightingale and Cod-man.
        J Royal Statist Soc A. 1999; 162: 45-58
        • Campbell M.J.
        • Machin D.
        Medical statistics: A commonsense approach.
        3rd ed. Wiley, Chichester1999
        • Feinstein A.R.
        Principles of medical statistics.
        Chapman & Hall, Boca Raton, FL2002
        • Localio A.R.
        • Hamory B.H.
        • Sharp T.J.
        • Weaver S.L.
        • TenHave T.R.
        • Landis J.R.
        Comparing hospital mortality in adult patients with pneumonia.
        Ann Intern Med. 1995; 122: 125-132
        • Aguilar O.
        • West M.
        Gatsonis C. Kass R.E. Carlin B. Carriquiry A. Gelman A. Verdinelli I. Case studies in Bayesian statistics. Analysis of hospital quality monitors using hierarchical time series models. Vol. 4. Springer-Verlag, New York1998: 287-302
        • Benjamini Y.
        • Yekutieli D.
        . The control of the false discovery rate in multiple testing under dependency.
        Ann Stat. 2001; 29: 1165-1188
        • Burgess J.F.
        • Christiansen C.L.
        • Michalak S.E.
        • Morris C.N.
        Medical profiling: improving standards and risk adjustment using hierarchical models.
        J Health Econ. 2000; 19: 291-309
        • Spiegelhalter D.J.
        Handling over-dispersion of performance indicators.
        Qual Saf Health Care. 2005; 14: 347-351
      2. Ohlssen DI, Sharples LD, Spiegelhalter DJ. A hierarchical modelling framework for identifying unusual performance in health care providers. J R Stat Soc A, In press.

        • Rice K.
        • Spiegelhalter D.J.
        A simple diagnostic plot connecting robust estimation, outliers and false discovery rates.
        J Appl Stat. 2006; 33: 1131-1147
        • Goldstein H.
        • Spiegelhalter D.J.
        League tables and their limitations: statistical issues in comparisons of institutional performance.
        J R Stat Soc A. 1996; 159: 385-443
        • Deely J.J.
        • Smith A.F.M.
        Quantitative refinements for comparisons of performance.
        J R Stat Soc A. 1998; 161: 5-12
        • Longford N.T.
        Commissioned analysis of surgical performance using routine data: lessons from the Bristol inquiry—discussion on the paper by Spiegelhalter, Aylin, Best, Evans and Murray.
        J R Stat Soc A. 2002; 165: 221-231