Abstract
Background and Objectives
The performance of a prediction model is usually worse in external validation data
compared to the development data. We aimed to determine at which effective sample
sizes (i.e., number of events) relevant differences in model performance can be detected
with adequate power.
Methods
We used a logistic regression model to predict the probability that residual masses
of patients treated for metastatic testicular cancer contained only benign tissue.
We performed standard power calculations and Monte Carlo simulations to estimate the
numbers of events that are required to detect several types of model invalidity with
80% power at the 5% significance level.
Results
A validation sample with 111 events was required to detect that a model predicted
too high probabilities, when predictions were on average 1.5 times too high on the
odds scale. A decrease in discriminative ability of the model, indicated by a decrease
in the c-statistic from 0.83 to 0.73, required 81 to 106 events, depending on the specific
scenario.
Conclusion
We suggest a minimum of 100 events and 100 nonevents for external validation samples.
Specific hypotheses may, however, require substantially higher effective sample sizes
to obtain adequate power.
Keywords
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Journal of Clinical EpidemiologyAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- Assessing the generalizability of prognostic information.Ann Intern Med. 1999; 130: 515-524
- Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.Stat Med. 1996; 15: 361-387
- Construction and assessment of classification rules.John Wiley & Sons Ltd., Chisester, England1997
- Data splitting.Am Stat. 1990; 44: 140-147
- Validation of probabilistic predictions.Med Decis Making. 1993; 13: 49-58
- Prediction of residual retroperitoneal mass histology following chemotherapy for metastatic nonseminomatous germ cell tumor: multivariate analysis of individual patient data from six study groups.J Clin Oncol. 1995; 13: 1177-1187
- Construction, validation and updating of a prognostic model for kidney graft survival.Stat Med. 1995; 14: 1999-2008
- A clinical prediction rule for renal artery stenosis.Ann Intern Med. 1998; 129: 705-711
- What do we mean by validating a prognostic model?.Stat Med. 2000; 19: 453-473
- External validity of predictive models: a comparison of logistic regression, classification trees, and neural networks.J Clin Epidemiol. 2003; 56: 721-729
- Validation of a prediction model and its predictors for the histology of residual masses in nonseminomatous testicular cancer.J Urol. 2001; 165: 84-88
- Validation and updating of predictive logistic regression models: a study on sample size and shrinkage.Stat Med. 2004; 23: 2567-2586
- Two further applications of a model for binary regression.Biometrika. 1958; 45: 562-565
- Regression, prediction and shrinkage.J R Stat Soc B. 1983; 45: 311-354
- Regression modelling strategies for improved prognostic prediction.Stat Med. 1984; 3: 143-152
- Probabilistic prediction in patient management and clinical trials.Stat Med. 1986; 5: 421-433
- Validation techniques for logistic regression models.Stat Med. 1991; 10: 1213-1226
- Model uncertainty, data mining and statistical inference.J R Stat Soc A. 1995; 158: 419-466
- Intra-institutional prediction of outcome after cardiac surgery: comparison between a locally derived model and the EuroSCORE.Eur J Cardiothorac Surg. 2000; 18: 703-710
- Residual mass histology in testicular cancer: development and validation of a clinical prediction rule.Stat Med. 2001; 20: 3847-3859
- Predictive value of statistical models.Stat Med. 1990; 9: 1303-1325
- Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.Stat Med. 2000; 19: 1059-1079
- External validity of a prediction rule for residual mass histology in testicular cancer: an evaluation for good prognosis patients.Br J Cancer. 2003; 88: 843-847
- Applied logistic regression.Wiley, New York1989
- The covariance decomposition of the probability score and its use in evaluating prognostic estimates.Med Decis Making. 1995; 15: 120-131
- A note on the general definition of the coefficient of determination.Biometrika. 1991; 78: 691-692
- The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology. 1982; 143: 29-36
- A comparison of goodness-of-fit tests for the logistic regression model.Stat Med. 1997; 16: 965-980
- Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates.J Clin Epidemiol. 1995; 48: 1503-1510
- Calibration and discrimination by daily Logistic Organ Dysfunction scoring comparatively with daily Sequential Organ Failure Assessment scoring for predicting hospital mortality in critically ill patients.Crit Care Med. 2002; 30: 2003-2013
- Prospective testing of two models based on clinical and oximetric variables for prediction of obstructive sleep apnea.Chest. 2002; 121: 747-752
- Prediction of bacterial meningitis in children with meningeal signs: reduction of lumbar punctures.Acta Paediatr. 2001; 90: 611-617
- Development and validation of a prognostic model to predict the length of survival in patients with carcinomas of an unknown primary site.J Clin Oncol. 2002; 20: 4679-4683
- A prognostic index for 30-day mortality after stroke.J Clin Epidemiol. 2001; 54: 766-773
- Agressive management of severe closed head injury: time for reappraisal.Lancet. 1989; 334: 369-371
- Evaluation of the Leeds prognostic score for severe head injury.Lancet. 1991; 337: 1451-1453
- Inability to predict relapse in acute asthma.N Engl J Med. 1984; 310: 577-580
- Validation of a coronary prognostic index for the Chinese—a tale of three cities.Int J Cardiol. 1989; 23: 173-178
- Developmentment and evaluation of evidence based risk assessment tool (STRATIFY) to predict which elderly inpatients will fall: case–control and cohort studies.BMJ. 1997; 315: 1049-1053
Article info
Publication history
Accepted:
June 21,
2004
Identification
Copyright
© 2005 Elsevier Inc. Published by Elsevier Inc. All rights reserved.