Abstract
Objective
Study Design and Setting
Results
Conclusion
Keywords
- •Using data on patients hospitalized with heart failure in the Canadian province of Ontario and a previously derived clinical prediction model, we found that several strategies to quantify model performance showed similar overall results, with moderate variation in center-specific performance.
- •Ninety-five percent prediction intervals for a new hospital-specific c-statistic were moderately wide in each of the two time periods.
Key findings
- •Bootstrap correction for optimism resulted in a similar overall estimate of model performance as a leave-one-hospital-out approach, in which each hospital was used once for model validation.
- •Random-effects meta-analysis provided insight into the variability of center-specific performance measures as an indication of geographical transportability of a prediction model, when the focus is on within-center performance of the model.
What this adds to what was known?
- •Appropriate statistical methods should be used to quantify the geographic and temporal portability of clinical prediction models.
- •Validation studies of clinical prediction models should carefully describe whether overall validity of a model is reported, or that transportability is addressed by assessment of geographical or temporal variability in performance.
What is the implication and what should change now?
1. Introduction
2. Methods
2.1 Data sources
2.2 Heart failure mortality prediction model
2.3 Measures of model performance
2.4 Statistical methods for assessing geographic and temporal validity
Method | Description |
---|---|
Methods that ignore temporal and geographic variation | |
Apparent performance | Model performance is assessed in the sample in which it was developed. No adjustment is made for the model being optimized to fit in the sample used for derivation and validation. |
Optimism-corrected performance | Model is derived in a bootstrap sample and applied to the overall sample to provide an estimate of model optimism. The average optimism is computed over a large number of bootstrap samples and is subtracted from the estimate of apparent performance. |
Geographic transportability | |
Internal–external: Leave-one-hospital-out (pooled) | Data from one hospital are withheld and the model is derived using data from the remaining hospitals. The model is then applied to subjects from the withheld hospital to obtain predicted probabilities for each of the withheld subjects. This process is repeated so that each hospital is excluded once from the derivation sample. Model performance is then determined in the pooled sample consisting of the predictions for each subject when that subject's hospital was excluded from the model derivation sample. |
Internal–external: Leave-one-hospital-out (meta-analysis) | As for internal–external, but rather than estimating performance on the pooled sample, we combine the hospital-specific estimates of model performance using a random-effects meta-analysis. |
Temporal transportability (model estimated in phase 1 and applied in phase 2) | |
Fixed-effects regression model | Model contains fixed intercept and fixed effects for all covariates (similar to all the models described previously). Model is derived in phase 1 and validated in phase 2. |
Mixed-effects regression model | Model contains hospital-specific random intercepts and fixed effects for all covariates. Model is derived in phase 1 and validated in phase 2. |
Case-mix adjusted performance | Model is developed in phase 1 and applied to subjects in phase 2. Using the predicted probability of the occurrence of the outcome, outcomes are simulated for each subject in phase 2. Using the simulated outcome and the predicted probability of the occurrence of the outcome, model performance is assessed. This process is repeated 1,000 times to obtain a stable estimate of model performance. |
Simultaneous geographic and temporal portability | |
Leave-one-hospital-out temporally (meta-analysis) | Data from one hospital are withheld. The model is derived using phase 1 data from the remaining hospitals. The model is then validated in the excluded hospital using data from phase 2. Process is repeated so that each hospital is used once for model validation. The hospital-specific estimates of performance are then pooled using a random-effects meta-analysis. |
Leave-one-hospital-out temporally (pooled) | Data from one hospital are withheld. The model is derived using phase 1 data from the remaining hospitals. The model is then applied to the excluded hospital using data from phase 2. Process is repeated so that each hospital is used once for model validation. The estimated probability of the outcome is pooled across all patients at all hospitals and the c-statistic is calculated. |
2.4.1 Model reproducibility: bootstrap estimates of optimism-corrected performance
2.4.2 Estimates of temporal transportability
2.4.3 Assessing geographic portability of the model
2.4.4 Simultaneous geographic and temporal transportability
2.4.5 Effects of changes in case-mix on temporal variation in model performance
3. Results
3.1 Reproducibility
Method | Phase 1 | Phase 2 |
---|---|---|
Reproducibility (performance in different patients from the same population) | ||
Apparent performance | 0.747 | 0.747 |
Optimism-corrected performance | 0.745 | 0.745 |
Leave-one-hospital-out (pooled) | 0.745 | 0.745 |
Leave-one-hospital-out (meta-analysis of model performance) | 0.752 | 0.754 |
Temporal transportability (estimate in phase 1 and apply in phase 2) | ||
No hospital-specific random effects (model contained a fixed intercept and fixed effects for the predictor variables) | 0.745 | |
With hospital-specific random effects (model contained hospital-specific intercepts and fixed effects for the predictor variables) | 0.745 | |
Case-mix adjusted performance | 0.746 | |
Simultaneous geographic and temporal transportability | ||
Model estimated in 89 hospitals in phase 1 and then applied to the excluded hospital in phase 2 (meta-analytic pooling of performance estimates) (“leave-one-hospital-out [meta-analysis]”) | 0.753 | |
Model estimated in 89 hospitals in phase 1 and then applied to the excluded hospital in phase 2 (“leave-one-hospital-out [pooled]”) | 0.745 |
3.2 Geographic transportability



3.3 Temporal transportability

3.4 Simultaneous geographic and temporal transportability
4. Discussion

Supplementary data
- Online Appendix
References
- Prognosis and prognostic research: validating a prognostic model.BMJ. 2009; 338: b605
- What do we mean by validating a prognostic model?.Stat Med. 2000; 19: 453-473
- Assessing the generalizability of prognostic information.Ann Intern Med. 1999; 130: 515-524
- A new framework to enhance the interpretation of external validation studies of clinical prediction models.J Clin Epidemiol. 2015; 68: 279-289
- Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial.J Am Med Assoc. 2009; 302: 2330-2337
- Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model.J Am Med Assoc. 2003; 290: 2581-2587
- Clinical prediction models.Springer-Verlag, New York2009
- Regression modeling strategies.Springer-Verlag, New York, NY2001
- Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers.Stat Med. 2014; 33: 517-535
- Two further applications of a model for binary regression.Biometrika. 1958; 45: 562-565
- Validation of probabilistic predictions.Med Decis Making. 1993; 13: 49-57
- Data splitting.Am Stat. 1990; 44: 140-147
- An experimental comparison of cross-validation techniques for estimating the area under the ROC curve.Comput Stat Data Anal. 2011; 55: 1828-1844
- Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models.Stat Methods Med Res. 2014; (http://dx.doi.org/10.1177/0962280214558972 [Epub ahead of print])
- Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.J Clin Epidemiol. 2001; 54: 774-781
- Correcting for optimistic prediction in small data sets.Am J Epidemiol. 2014; 180: 318-324
- Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer.Stat Med. 2004; 23: 907-926
- Assessing discriminative ability of risk models in clustered data.BMC Med Res Methodol. 2014; 14: 5
- Applied meta-analysis with R.CRC Press, Boca Raton, FL2013
- Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.BMC Med Res Methodol. 2012; 12: 82
- External validity of risk models: use of benchmark values to disentangle a case-mix effect from incorrect coefficients.Am J Epidemiol. 2010; 172: 971-980
- Multilevel analysis: an introduction to basic and advanced multilevel modeling.Sage Publications, London1999
- Interpretation of random effects meta-analyses.BMJ. 2011; 342: d549
- Prediction models need appropriate internal, internal-external, and external validation.J Clin Epidemiol. 2016; 69: 245-247
Austin PC, van Klaveren D, Vergouwe Y, Nieboer D, Lee DS, Steyerberg EW. Geographic and temporal validity of prediction models: examining temporal and geographic stability of baseline risk and estimated covariate effects. Unpublished manuscript.
Article info
Publication history
Footnotes
Conflicts of interest: None.
Funding: This study was supported by the Institute for Clinical Evaluative Sciences (ICES), which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions, results, and conclusions reported in this article are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. This research was supported by an operating grant from the Canadian Institutes of Health Research (CIHR) (MOP 86508). P.C.A. is supported in part by a Career Investigator award from the Heart and Stroke Foundation. D.S.L. is supported by a Clinician-Scientist award from the CIHR and by the Ted Rogers Chair in Heart Function Outcomes. E.W.S. and D.v.K. are supported in part by a U award (U01NS086294, value of personalized risk information). D.v.K. and Y.V. are supported in part by the Netherlands Organisation for Scientific Research (grant 917.11.383). The Enhanced Feedback for Effective Cardiac Treatment (EFFECT) data used in the study were funded by a CIHR Team Grant in Cardiovascular Outcomes Research. These data sets were linked using unique, encoded identifiers and analyzed at the Institute for Clinical Evaluative Sciences (ICES).
Identification
Copyright
User license
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |
Permitted
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works
Not Permitted
- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article
Elsevier's open access license policy