Original Article| Volume 58, ISSUE 5, P475-483, May 2005

Substantial effective sample sizes were required for external validation studies of predictive logistic regression models


      Background and Objectives

      The performance of a prediction model is usually worse in external validation data compared to the development data. We aimed to determine at which effective sample sizes (i.e., number of events) relevant differences in model performance can be detected with adequate power.


      We used a logistic regression model to predict the probability that residual masses of patients treated for metastatic testicular cancer contained only benign tissue. We performed standard power calculations and Monte Carlo simulations to estimate the numbers of events that are required to detect several types of model invalidity with 80% power at the 5% significance level.


      A validation sample with 111 events was required to detect that a model predicted too high probabilities, when predictions were on average 1.5 times too high on the odds scale. A decrease in discriminative ability of the model, indicated by a decrease in the c-statistic from 0.83 to 0.73, required 81 to 106 events, depending on the specific scenario.


      We suggest a minimum of 100 events and 100 nonevents for external validation samples. Specific hypotheses may, however, require substantially higher effective sample sizes to obtain adequate power.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Justice A.C.
        • Covinsky K.E.
        • Berlin J.A.
        Assessing the generalizability of prognostic information.
        Ann Intern Med. 1999; 130: 515-524
        • Harrell Jr., F.E.
        • Lee K.L.
        • Mark D.B.
        Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
        Stat Med. 1996; 15: 361-387
        • Hand J.D.
        Construction and assessment of classification rules.
        John Wiley & Sons Ltd., Chisester, England1997
        • Picard R.R.
        • Berk K.N.
        Data splitting.
        Am Stat. 1990; 44: 140-147
        • Miller M.E.
        • Langefeld C.D.
        • Tierney W.M.
        • Hui S.L.
        • McDonald C.J.
        Validation of probabilistic predictions.
        Med Decis Making. 1993; 13: 49-58
        • Steyerberg E.W.
        • Keizer H.J.
        • Fosså S.D.
        • Sleijfer D.T.
        • Toner G.C.
        • Schraffordt Koops H.
        • Mulders P.F.
        • Messemer J.E.
        • Ney K.
        • Donohue J.P.
        • Bajorin D.F.
        • Stoter G.
        • Bosl G.J.
        • Habbema J.D.
        Prediction of residual retroperitoneal mass histology following chemotherapy for metastatic nonseminomatous germ cell tumor: multivariate analysis of individual patient data from six study groups.
        J Clin Oncol. 1995; 13: 1177-1187
        • van Houwelingen H.C.
        • Thorogood J.
        Construction, validation and updating of a prognostic model for kidney graft survival.
        Stat Med. 1995; 14: 1999-2008
        • Krijnen P.
        • van Jaarsveld B.C.
        • Steyerberg E.W.
        • Man in't Veld A.J.
        • Schalekamp M.A.
        • Habbema J.D.
        A clinical prediction rule for renal artery stenosis.
        Ann Intern Med. 1998; 129: 705-711
        • Altman D.G.
        • Royston P.
        What do we mean by validating a prognostic model?.
        Stat Med. 2000; 19: 453-473
        • Terrin N.
        • Schmid C.H.
        • Griffith J.L.
        • D'Agostino R.B.
        • Selker H.P.
        External validity of predictive models: a comparison of logistic regression, classification trees, and neural networks.
        J Clin Epidemiol. 2003; 56: 721-729
        • Vergouwe Y.
        • Steyerberg E.W.
        • Foster R.S.
        • Habbema J.D.
        • Donohue J.P.
        Validation of a prediction model and its predictors for the histology of residual masses in nonseminomatous testicular cancer.
        J Urol. 2001; 165: 84-88
        • Steyerberg E.W.
        • Borsboom G.J.
        • van Houwelingen H.C.
        • Eijkemans M.J.
        • Habbema J.D.
        Validation and updating of predictive logistic regression models: a study on sample size and shrinkage.
        Stat Med. 2004; 23: 2567-2586
        • Cox D.R.
        Two further applications of a model for binary regression.
        Biometrika. 1958; 45: 562-565
        • Copas J.B.
        Regression, prediction and shrinkage.
        J R Stat Soc B. 1983; 45: 311-354
        • Harrel Jr., F.E.
        • Lee K.L.
        • Califf R.M.
        • Pryor D.B.
        • Rosati R.A.
        Regression modelling strategies for improved prognostic prediction.
        Stat Med. 1984; 3: 143-152
        • Spiegelhalter D.J.
        Probabilistic prediction in patient management and clinical trials.
        Stat Med. 1986; 5: 421-433
        • Miller M.E.
        • Hui S.L.
        Validation techniques for logistic regression models.
        Stat Med. 1991; 10: 1213-1226
        • Chatfield C.
        Model uncertainty, data mining and statistical inference.
        J R Stat Soc A. 1995; 158: 419-466
        • Pitkänen O.
        • Niskanen M.
        • Rehnberg S.
        • Hippelainen M.
        • Hynynen M.
        Intra-institutional prediction of outcome after cardiac surgery: comparison between a locally derived model and the EuroSCORE.
        Eur J Cardiothorac Surg. 2000; 18: 703-710
        • Steyerberg E.W.
        • Vergouwe Y.
        • Keizer H.J.
        • Habbema J.D.
        Residual mass histology in testicular cancer: development and validation of a clinical prediction rule.
        Stat Med. 2001; 20: 3847-3859
        • van Houwelingen J.C.
        • le Cessie S.
        Predictive value of statistical models.
        Stat Med. 1990; 9: 1303-1325
        • Steyerberg E.W.
        • Eijkemans M.J.C.
        • Harrell Jr., F.E.
        • Habbema J.D.
        Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.
        Stat Med. 2000; 19: 1059-1079
        • Vergouwe Y.
        • Steyerberg E.W.
        • de Wit R.
        • Roberts J.T.
        • Keizer H.J.
        • Collette L.
        • Stenning S.P.
        • Habbema J.D.F.
        External validity of a prediction rule for residual mass histology in testicular cancer: an evaluation for good prognosis patients.
        Br J Cancer. 2003; 88: 843-847
        • Lemeshow S.
        • Hosmer D.W.
        Applied logistic regression.
        Wiley, New York1989
        • Arkes H.R.
        • Dawson N.V.
        • Speroff T.
        • Harrell Jr., F.E.
        • Alzola C.
        • Philips R.
        • Desbiens N.
        • Oye R.K.
        • Knaus W.
        • Connors Jr., A.F.
        The covariance decomposition of the probability score and its use in evaluating prognostic estimates.
        Med Decis Making. 1995; 15: 120-131
        • Nagelkerke N.J.
        A note on the general definition of the coefficient of determination.
        Biometrika. 1991; 78: 691-692
        • Hanley J.A.
        • McNeil B.J.
        The meaning and use of the area under a receiver operating characteristic (ROC) curve.
        Radiology. 1982; 143: 29-36
        • Hosmer D.W.
        • Hosmer T.
        • le Cessie S.
        • Lemeshow S.
        A comparison of goodness-of-fit tests for the logistic regression model.
        Stat Med. 1997; 16: 965-980
        • Peduzzi P.
        • Concato J.
        • Feinstein A.R.
        • Holford T.R.
        Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates.
        J Clin Epidemiol. 1995; 48: 1503-1510
        • Timsit J.F.
        • Fosse J.P.
        • Troche G.
        • De Lassence A.
        • Alberti C.
        • Garrouste-Orgeas M.
        • Bornstain C.
        • Adrie C.
        • Cheval C.
        • Chevret S.
        Calibration and discrimination by daily Logistic Organ Dysfunction scoring comparatively with daily Sequential Organ Failure Assessment scoring for predicting hospital mortality in critically ill patients.
        Crit Care Med. 2002; 30: 2003-2013
        • Roche N.
        • Herer B.
        • Roig C.
        • Huchon G.
        Prospective testing of two models based on clinical and oximetric variables for prediction of obstructive sleep apnea.
        Chest. 2002; 121: 747-752
        • Oostenbrink R.
        • Moons K.G.
        • Donders A.R.
        • Grobbee D.E.
        • Moll H.A.
        Prediction of bacterial meningitis in children with meningeal signs: reduction of lumbar punctures.
        Acta Paediatr. 2001; 90: 611-617
        • Culine S.
        • Kramar A.
        • Saghatchian M.
        • Bugat R.
        • Lesimple T.
        • Lortholary A.
        • Merrouche Y.
        • Laplanche A.
        • Fizazi K.
        Development and validation of a prognostic model to predict the length of survival in patients with carcinomas of an unknown primary site.
        J Clin Oncol. 2002; 20: 4679-4683
        • Wang Y.
        • Lim L.L.
        • Levi C.
        • Heller R.F.
        • Fischer J.
        A prognostic index for 30-day mortality after stroke.
        J Clin Epidemiol. 2001; 54: 766-773
        • Gibson R.M.
        • Stephenson G.C.
        Agressive management of severe closed head injury: time for reappraisal.
        Lancet. 1989; 334: 369-371
        • Feldman Z.
        • Contant C.F.
        • Robertson C.S.
        • Narayan R.K.
        • Grossman R.G.
        Evaluation of the Leeds prognostic score for severe head injury.
        Lancet. 1991; 337: 1451-1453
        • Centor R.M.
        • Yarbrough B.
        • Wood J.P.
        Inability to predict relapse in acute asthma.
        N Engl J Med. 1984; 310: 577-580
        • Woo K.S.
        • Pun C.O.
        • Wang R.Y.
        • Ma H.
        • Huang Z.Z.
        • Dai R.H.
        • Huang D.J.
        • Vallance-Owen J.
        Validation of a coronary prognostic index for the Chinese—a tale of three cities.
        Int J Cardiol. 1989; 23: 173-178
        • Oliver D.
        • Britton M.
        • Seed P.
        • Martin F.C.
        • Hopper A.H.
        Developmentment and evaluation of evidence based risk assessment tool (STRATIFY) to predict which elderly inpatients will fall: case–control and cohort studies.
        BMJ. 1997; 315: 1049-1053