Highlights
- •When developing a clinical prediction model, penalization and shrinkage techniques are recommended to address overfitting.
- •Some methodology articles suggest penalization methods are a ‘carte blanche’ and resolve any issues to do with overfitting.
- •We show that penalization methods can be unreliable, as their unknown shrinkage and tuning parameter estimates are often estimated with large uncertainty.
- •Although penalization methods will, on average, improve on standard estimation methods, in a particular data set, they are often unreliable.
- •The most problematic data sets are those with small effective sample sizes and where the developed model has a Cox-Snell far from 1, which is common for prediction models of binary and time-to-event outcomes.
- •Penalization methods are best used in situations when a sufficiently large development data set is available, as identified from sample size calculations to minimize the potential for model overfitting and precisely estimate key parameters.
- •When the sample size is adequately large, any of the studied penalization or shrinkage methods can be used, as they should perform similarly and better than unpenalized regression unless sample size is extremely large and is large.
Abstract
Objectives
Study Design and Setting
Results
Conclusion
Keywords
- •When developing a clinical prediction model, penalization techniques are recommended to address overfitting; however, they are not a ‘carte blanche’.
Key Findings
- •Although penalization methods will, on average, improve on standard estimation methods, in a particular data set, they can be unreliable, as their unknown shrinkage and tuning parameter estimates are often estimated with large uncertainty.
- •The most problematic data sets are those with small effective sample sizes and where the developed model has a Cox-Snell R2 far from 1, which is common for prediction models of binary and time-to-event outcomes.
What this adds to what was known?
- •Penalization methods are best used in situations when a sufficiently large development data set is available, as identified from sample size calculations to minimize the potential for model overfitting and precisely estimate key parameters.
- •When the sample size is adequately large, any of the studied penalization or shrinkage methods can be used, as they should perform similarly and better than unpenalized regression unless sample size is extremely large and R2app is large.
What is the implication and what should change now?
1. Introduction
2. Methods
2.1 Shrinkage and penalization methods
where is a shrinkage value between 0 and 1 and used to uniformly adjust the predictor effects ( estimated from standard maximum likelihood and is the updated intercept, estimated after determining and applying to ensure that the calibration-in-the-large is correct (i.e., that the sum of predicted probabilities equals the overall proportion of observed events).
2.2 Examples to illustrate uncertainty of uniform shrinkage estimate
2.3 Simulation study to examine uncertainty of tuning parameter estimates
2.3.1 Scenarios
2.3.2 Data generation
2.3.3 Model development
2.3.4 Model validation
2.3.5 Summarizing simulation results
3. Results
3.1 Uncertainty in uniform shrinkage estimate: findings from applied examples
Model | Outcome | Model equation derived using ordinary least squares estimation (i.e., before any shrinkage) | Number of patients/predictor parameters | Uniform shrinkage () estimate from 1,000 bootstrap samples (95% confidence interval) | |
---|---|---|---|---|---|
A | Systolic blood pressure (SBP) (low CVD risk population) | 28.10 + 0.46∗SBP + 0.41∗DBP + 0.013∗BMI + 0.45∗age − 2.05∗sex − 17.81∗treat − 2.08∗smoker | 262/7 = 37 | 0.23 | 0.94 (0.77 to 1.18) |
B | Systolic blood pressure (SBP) (high CVD risk population) | −12.69 + 0.94∗SBP + 0.21∗DBP −0.001∗BMI + 0.06∗age + 1.72∗sex − 1.04∗treat + 0.17∗smoker | 253/7 = 36 | 0.56 | 0.98 (0.87 to 1.10) |
C | ln(FEV) | −2.07 + 0.02∗age + 0.04∗height + 0.03∗sex + 0.05∗smoker | 654/4 = 164 | 0.81 | 1.00 (0.96 to 1.04) |

3.2 Importance of estimating shrinkage precisely: illustration using model A

3.3 Uncertainty in uniform shrinkage and penalized regression methods: findings from simulation study
3.3.1 Comparison of uncertainty in bootstrap and heuristic shrinkage estimates of
3.3.2 Uncertainty in tuning parameter estimates and prediction model performance

4. Discussion
4.1 Recommendations
4.2. Summary
Acknowledgments
Appendix A. Supplementary data
- Supplementary Material
- Appendix
References
- Prognosis Research in Healthcare: Concepts, Methods and Impact.Oxford University Press, Oxford, UK2019
- Clinical prediction models: a practical approach to development, validation, and updating.Springer, New York2009
- Regression shrinkage and selection via the lasso.J R Statist Soc B. 1996; 58: 267-288
- ridge regression: biased estimation for nonorthogonal problems.Technometrics. 1970; 12: 55-67
- Regularization and variable selection via the elastic net.J R Stat Soc Ser B Stat Methodol. 2005; 67: 301-320
- ridge regression: applications to nonorthogonal problems.Technometrics. 1970; 12: 69-82
- An evaluation of penalised survival methods for developing prognostic models with rare events.Stat Med. 2012; 31: 1150-1161
- How to develop a more accurate risk prediction model when there are few events.BMJ. 2015; 351: h3868
- Application of shrinkage techniques in logistic regression analysis: a case study.Stat Neerl. 2001; 55: 76-88
- Regression, prediction and shrinkage.J R Stat Soc Ser B Methodol. 1983; 45: 311-354
- Using regression models for prediction: shrinkage and regression to the mean.Stat Methods Med Res. 1997; 6: 167-183
- Shrinkage and penalized likelihood as methods to improve predictive accuracy.Stat Neerl. 2001; 55: 17-34
- Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets.Med Decis Making. 2001; 21: 45-56
- Regression shrinkage methods for clinical prediction models do not guarantee improved performance: simulation study.Stat Methods Med Res. 2020; (962280220921415)
- Calculating the sample size required for developing a clinical prediction model.BMJ. 2020; 368: m441
- Minimum sample size for developing a multivariable prediction model: Part II - binary and time-to-event outcomes.Stat Med. 2019; 38: 1276-1296
- Minimum sample size for developing a multivariable prediction model: Part I - continuous outcomes.Stat Med. 2019; 38: 1262-1275
- Predictive value of statistical models.Stat Med. 1990; 9: 1303-1325
- Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.2nd ed. Springer, New York2015
- Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.Ann Intern Med. 2015; 162: W1-W73
- The Analysis of Binary Data.2nd ed. Chapman and Hall, London1989
- Estimating misclassification error with small samples via bootstrap cross-validation.Bioinformatics. 2005; 21: 1979-1986
- Stabilizing the lasso against cross-validation variability.Comput Stat Data Anal. 2014; 70: 198-211
- Meta-analysis of randomised trials with a continuous outcome according to baseline imbalance and availability of individual participant data.Stat Med. 2013; 32: 2747-2766
- Fundamentals of Biostatistics.5th ed. Duxbury, CA: Pacific Grove1999
- Regularization paths for generalized linear models via coordinate descent.J Stat Softw. 2010; 33: 1-22
- Sample size considerations for the external validation of a multivariable prognostic model: a resampling study.Stat Med. 2016; 35: 214-226
- A calibration hierarchy for risk models was defined: from utopia to empirical data.J Clin Epidemiol. 2016; 74: 167-176
- The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models.Stat Med. 2019; 38: 4051-4065
- Assessing calibration of multinomial risk prediction models.Stat Med. 2014; 33: 2585-2596
Article info
Publication history
Footnotes
Conflicts of interest: None.
Funding: G.C. is supported by Cancer Research UK (program grant: C49297/A27294) and the NIHR Biomedical Research Centre, Oxford. K.S. is funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR). This publication presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. G.M. and R.R. are partially funded by the MRC-NIHR Methodology Research Program [grant number: MR/T025085/1].
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy