## Highlights

- •When developing a clinical prediction model, penalization and shrinkage techniques are recommended to address overfitting.
- •Some methodology articles suggest penalization methods are a ‘carte blanche’ and resolve any issues to do with overfitting.
- •We show that penalization methods can be unreliable, as their unknown shrinkage and tuning parameter estimates are often estimated with large uncertainty.
- •Although penalization methods will, on average, improve on standard estimation methods, in a particular data set, they are often unreliable.
- •The most problematic data sets are those with small effective sample sizes and where the developed model has a Cox-Snell ${R}^{2}$ far from 1, which is common for prediction models of binary and time-to-event outcomes.
- •Penalization methods are best used in situations when a sufficiently large development data set is available, as identified from sample size calculations to minimize the potential for model overfitting and precisely estimate key parameters.
- •When the sample size is adequately large, any of the studied penalization or shrinkage methods can be used, as they should perform similarly and better than unpenalized regression unless sample size is extremely large and ${R}_{app}^{2}$ is large.

## Abstract

### Objectives

### Study Design and Setting

### Results

### Conclusion

## Keywords

**What is new?**

- •When developing a clinical prediction model, penalization techniques are recommended to address overfitting; however, they are not a ‘carte blanche’.

### Key Findings

- •Although penalization methods will, on average, improve on standard estimation methods, in a particular data set, they can be unreliable, as their unknown shrinkage and tuning parameter estimates are often estimated with large uncertainty.
- •The most problematic data sets are those with small effective sample sizes and where the developed model has a Cox-Snell
*R2*far from 1, which is common for prediction models of binary and time-to-event outcomes.

### What this adds to what was known?

- •Penalization methods are best used in situations when a sufficiently large development data set is available, as identified from sample size calculations to minimize the potential for model overfitting and precisely estimate key parameters.
- •When the sample size is adequately large, any of the studied penalization or shrinkage methods can be used, as they should perform similarly and better than unpenalized regression unless sample size is extremely large and
*R2app*is large.

### What is the implication and what should change now?

## 1. Introduction

## 2. Methods

### 2.1 Shrinkage and penalization methods

where $S$ is a shrinkage value between 0 and 1 and used to uniformly adjust the predictor effects (${\stackrel{\u02c6}{\beta}}_{1},{\stackrel{\u02c6}{\beta}}_{2},\dots )$ estimated from standard maximum likelihood and ${\alpha}^{\ast}$ is the updated intercept, estimated after determining and applying $S$ to ensure that the calibration-in-the-large is correct (i.e., that the sum of predicted probabilities equals the overall proportion of observed events).

*K*-fold cross-validation, repeated

*K*-fold cross-validation, or bootstrap

*K*-fold cross-validation [

### 2.2 Examples to illustrate uncertainty of uniform shrinkage estimate

^{2}), age (years), sex (female = 0, male = 1), current smoker (yes = 1, no = 0), and antihypertensive treatment (yes = 1, no = 0).

### 2.3 Simulation study to examine uncertainty of tuning parameter estimates

#### 2.3.1 Scenarios

*N*= 100 to

*N*= 1,000 (in steps of 100), corresponding to an events-per-parameter value of 2.5 (for

*N*= 100; 50 outcome events) to 25 (for

*N*= 1,000; 500 outcome events). The scenarios are a pragmatic choice, to cover a range of events per parameter and to differ from those elsewhere [

#### 2.3.2 Data generation

#### 2.3.3 Model development

#### 2.3.4 Model validation

*N*= 5,000; 2,500 outcome events) was created using the same data generating procedure, giving a much larger effective sample size than the recommended 100 to 250 outcome events for validating a prediction model [

^{2}, calibration-in-the large, and the calibration slope.

#### 2.3.5 Summarizing simulation results

## 3. Results

### 3.1 Uncertainty in uniform shrinkage estimate: findings from applied examples

Model | Outcome | Model equation derived using ordinary least squares estimation (i.e., before any shrinkage) | Number of patients/predictor parameters | ${R}_{app}^{2}$ | Uniform shrinkage ($S$) estimate from 1,000 bootstrap samples (95% confidence interval) |
---|---|---|---|---|---|

A | Systolic blood pressure (SBP) (low CVD risk population) | 28.10 + 0.46∗SBP + 0.41∗DBP + 0.013∗BMI + 0.45∗age − 2.05∗sex − 17.81∗treat − 2.08∗smoker | 262/7 = 37 | 0.23 | 0.94 (0.77 to 1.18) |

B | Systolic blood pressure (SBP) (high CVD risk population) | −12.69 + 0.94∗SBP + 0.21∗DBP −0.001∗BMI + 0.06∗age + 1.72∗sex − 1.04∗treat + 0.17∗smoker | 253/7 = 36 | 0.56 | 0.98 (0.87 to 1.10) |

C | ln(FEV) | −2.07 + 0.02∗age + 0.04∗height + 0.03∗sex + 0.05∗smoker | 654/4 = 164 | 0.81 | 1.00 (0.96 to 1.04) |

### 3.2 Importance of estimating shrinkage precisely: illustration using model A

### 3.3 Uncertainty in uniform shrinkage and penalized regression methods: findings from simulation study

#### 3.3.1 Comparison of uncertainty in bootstrap and heuristic shrinkage estimates of $\mathit{S}$

#### 3.3.2 Uncertainty in tuning parameter estimates and prediction model performance

## 4. Discussion

*on average*, in a particular data set, they can be unreliable. The most problematic data sets are those with small effective sample sizes and where the developed model has an ${R}_{app}^{2}$ far from 1, which is common for prediction models of binary and time-to-event outcomes [

### 4.1 Recommendations

## 4.2. Summary

## Acknowledgments

## Appendix A. Supplementary data

- Supplementary Material

- Appendix

## References

- Prognosis Research in Healthcare: Concepts, Methods and Impact.Oxford University Press, Oxford, UK2019
- Clinical prediction models: a practical approach to development, validation, and updating.Springer, New York2009
- Regression shrinkage and selection via the lasso.
*J R Statist Soc B.*1996; 58: 267-288 - ridge regression: biased estimation for nonorthogonal problems.
*Technometrics.*1970; 12: 55-67 - Regularization and variable selection via the elastic net.
*J R Stat Soc Ser B Stat Methodol.*2005; 67: 301-320 - ridge regression: applications to nonorthogonal problems.
*Technometrics.*1970; 12: 69-82 - An evaluation of penalised survival methods for developing prognostic models with rare events.
*Stat Med.*2012; 31: 1150-1161 - How to develop a more accurate risk prediction model when there are few events.
*BMJ.*2015; 351: h3868 - Application of shrinkage techniques in logistic regression analysis: a case study.
*Stat Neerl.*2001; 55: 76-88 - Regression, prediction and shrinkage.
*J R Stat Soc Ser B Methodol.*1983; 45: 311-354 - Using regression models for prediction: shrinkage and regression to the mean.
*Stat Methods Med Res.*1997; 6: 167-183 - Shrinkage and penalized likelihood as methods to improve predictive accuracy.
*Stat Neerl.*2001; 55: 17-34 - Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets.
*Med Decis Making.*2001; 21: 45-56 - Regression shrinkage methods for clinical prediction models do not guarantee improved performance: simulation study.
*Stat Methods Med Res.*2020; (962280220921415) - Calculating the sample size required for developing a clinical prediction model.
*BMJ.*2020; 368: m441 - Minimum sample size for developing a multivariable prediction model: Part II - binary and time-to-event outcomes.
*Stat Med.*2019; 38: 1276-1296 - Minimum sample size for developing a multivariable prediction model: Part I - continuous outcomes.
*Stat Med.*2019; 38: 1262-1275 - Predictive value of statistical models.
*Stat Med.*1990; 9: 1303-1325 - Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.2nd ed. Springer, New York2015
- Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.
*Ann Intern Med.*2015; 162: W1-W73 - The Analysis of Binary Data.2nd ed. Chapman and Hall, London1989
- Estimating misclassification error with small samples via bootstrap cross-validation.
*Bioinformatics.*2005; 21: 1979-1986 - Stabilizing the lasso against cross-validation variability.
*Comput Stat Data Anal.*2014; 70: 198-211 - Meta-analysis of randomised trials with a continuous outcome according to baseline imbalance and availability of individual participant data.
*Stat Med.*2013; 32: 2747-2766 - Fundamentals of Biostatistics.5th ed. Duxbury, CA: Pacific Grove1999
- Regularization paths for generalized linear models via coordinate descent.
*J Stat Softw.*2010; 33: 1-22 - Sample size considerations for the external validation of a multivariable prognostic model: a resampling study.
*Stat Med.*2016; 35: 214-226 - A calibration hierarchy for risk models was defined: from utopia to empirical data.
*J Clin Epidemiol.*2016; 74: 167-176 - The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models.
*Stat Med.*2019; 38: 4051-4065 - Assessing calibration of multinomial risk prediction models.
*Stat Med.*2014; 33: 2585-2596

## Article info

### Publication history

### Footnotes

Conflicts of interest: None.

Funding: G.C. is supported by Cancer Research UK (program grant: C49297/A27294) and the NIHR Biomedical Research Centre, Oxford. K.S. is funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR). This publication presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. G.M. and R.R. are partially funded by the MRC-NIHR Methodology Research Program [grant number: MR/T025085/1].

### Identification

### Copyright

### User license

Creative Commons Attribution (CC BY 4.0) |## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy