Advertisement
Original Article| Volume 98, P133-143, June 2018

Poor performance of clinical prediction models: the harm of commonly applied methods

Published:November 23, 2017DOI:https://doi.org/10.1016/j.jclinepi.2017.11.013

      Abstract

      Objective

      To evaluate limitations of common statistical modeling approaches in deriving clinical prediction models and explore alternative strategies.

      Study Design and Setting

      A previously published model predicted the likelihood of having a mutation in germline DNA mismatch repair genes at the time of diagnosis of colorectal cancer. This model was based on a cohort where 38 mutations were found among 870 participants, with validation in an independent cohort with 35 mutations. The modeling strategy included stepwise selection of predictors from a pool of over 37 candidate predictors and dichotomization of continuous predictors. We simulated this strategy in small subsets of a large contemporary cohort (2,051 mutations among 19,866 participants) and made comparisons to other modeling approaches. All models were evaluated according to bias and discriminative ability (concordance index, c) in independent data.

      Results

      We found over 50% bias for five of six originally selected predictors, unstable model specification, and poor performance at validation (median c = 0.74). A small validation sample hampered stable assessment of performance. Model prespecification based on external knowledge and using continuous predictors led to better performance (c = 0.836 and c = 0.852 with 38 and 2,051 events respectively).

      Conclusion

      Prediction models perform poorly if based on small numbers of events and developed with common but suboptimal statistical approaches. Alternative modeling strategies to best exploit available predictive information need wider implementation, with collaborative research to increase sample sizes.

      Keywords

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Kattan M.W.
        • Hess K.R.
        • Amin M.B.
        • Lu Y.
        • Moons K.G.
        • Gershenwald J.E.
        • et al.
        American Joint Committee on Cancer acceptance criteria for inclusion of risk models for individualized prognosis in the practice of precision medicine.
        CA Cancer J Clin. 2016; 66: 370-374
        • Moons K.G.
        • Altman D.G.
        • Reitsma J.B.
        • Ioannidis J.P.
        • Macaskill P.
        • Steyerberg E.W.
        • et al.
        Transparent Reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration.
        Ann Intern Med. 2015; 162: W1-W73
        • Mushkudiani N.A.
        • Hukkelhoven C.W.
        • Hernandez A.V.
        • Murray G.D.
        • Choi S.C.
        • Maas A.I.
        • et al.
        A systematic review finds methodological improvements necessary for prognostic models in determining traumatic brain injury outcomes.
        J Clin Epidemiol. 2008; 61: 331-343
        • Altman D.G.
        Prognostic models: a methodological framework and review of models for breast cancer.
        Cancer Invest. 2009; 27: 235-243
        • Mallett S.
        • Royston P.
        • Dutton S.
        • Waters R.
        • Altman D.G.
        Reporting methods in studies developing prognostic models in cancer: a review.
        BMC Med. 2010; 8: 20
        • Collins G.S.
        • Mallett S.
        • Omar O.
        • Yu L.M.
        Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting.
        BMC Med. 2011; 9: 103
        • Bouwmeester W.
        • Zuithoff N.P.
        • Mallett S.
        • Geerlings M.I.
        • Vergouwe Y.
        • Steyerberg E.W.
        • et al.
        Reporting and methods in clinical prediction research: a systematic review.
        PLoS Med. 2012; 9: 1-12
        • Collins G.S.
        • Omar O.
        • Shanyinde M.
        • Yu L.M.
        A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods.
        J Clin Epidemiol. 2013; 66: 268-277
        • Steyerberg E.W.
        Clinical prediction models: a practical approach to development, validation, and updating.
        Springer, New York2009
        • Vergouwe Y.
        • Steyerberg E.W.
        • Eijkemans M.J.
        • Habbema J.D.
        Substantial effective sample sizes were required for external validation studies of predictive logistic regression models.
        J Clin Epidemiol. 2005; 58: 475-483
        • Collins G.S.
        • Ogundimu E.O.
        • Altman D.G.
        Sample size considerations for the external validation of a multivariable prognostic model: a resampling study.
        Stat Med. 2016; 35: 214-226
        • Van Calster B.
        • Steyerberg E.W.
        • Bourne T.
        • Timmerman D.
        • Collins G.S.
        Flawed external validation study of the ADNEX model to diagnose ovarian cancer.
        Gynecol Oncol Rep. 2016; 18: 49-50
        • Collins G.S.
        • de Groot J.A.
        • Dutton S.
        • Omar O.
        • Shanyinde M.
        • Tajar A.
        • et al.
        External validation of multivariable prediction models: a systematic review of methodological conduct and reporting.
        BMC Med Res Methodol. 2014; 14: 40
        • Siontis G.C.
        • Tzoulaki I.
        • Castaldi P.J.
        • Ioannidis J.P.
        External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination.
        J Clin Epidemiol. 2015; 68: 25-34
        • Starmans R.
        • Muris J.W.
        • Fijten G.H.
        • Schouten H.J.
        • Pop P.
        • Knottnerus J.A.
        The diagnostic value of scoring models for organic and non-organic gastrointestinal disease, including the irritable-bowel syndrome.
        Med Decis Making. 1994; 14: 208-216
        • Barnetson R.A.
        • Tenesa A.
        • Farrington S.M.
        • Nicholl I.D.
        • Cetnarskyj R.
        • Porteous M.E.
        • et al.
        Identification and survival of carriers of mutations in DNA mismatch-repair genes in colon cancer.
        N Engl J Med. 2006; 354: 2751-2763
        • Kastrinos F.
        • Ojha R.P.
        • Leenen C.
        • Alvero C.
        • Mercado R.C.
        • Balmana J.
        • et al.
        Comparison of prediction models for Lynch syndrome among individuals with colorectal cancer.
        J Natl Cancer Inst. 2015; 108: 18
        • Giardiello F.M.
        • Allen J.I.
        • Axilbund J.E.
        • Boland C.R.
        • Burke C.A.
        • Burt R.W.
        • et al.
        Guidelines on genetic evaluation and management of Lynch syndrome: a consensus statement by the US Multi-Society Task Force on colorectal cancer.
        Gastroenterology. 2014; 147: 502-526
        • Syngal S.
        • Brand R.E.
        • Church J.M.
        • Giardiello F.M.
        • Hampel H.L.
        • Burt R.W.
        ACG clinical guideline: genetic testing and management of hereditary gastrointestinal cancer syndromes.
        Am J Gastroenterol. 2015; 110: 223-262
        • Balmana J.
        • Stockwell D.H.
        • Steyerberg E.W.
        • Stoffel E.M.
        • Deffenbaugh A.M.
        • Reid J.E.
        • et al.
        Prediction of MLH1 and MSH2 mutations in Lynch syndrome.
        JAMA. 2006; 296: 1469-1478
        • Kastrinos F.
        • Steyerberg E.W.
        • Mercado R.
        • Balmana J.
        • Holter S.
        • Gallinger S.
        • et al.
        The PREMM(1,2,6) model predicts risk of MLH1, MSH2, and MSH6 germline mutations based on cancer history.
        Gastroenterology. 2011; 140: 73-81
        • Kastrinos F.
        • Uno H.
        • Ukaegbu C.
        • Alvero C.
        • McFarland A.
        • Yurgelun M.B.
        • et al.
        Development and validation of the PREMM5 model for comprehensive risk assessment of Lynch syndrome.
        J Clin Oncol. 2017; 35: 2165-2172
      1. Barnetson RA, Appendix, Available at http://www.nejm.org/doi/suppl/10.1056/NEJMoa053493/suppl_file/nejm_barnetson_2751sa1.pdf. 2006. Accessed May 1, 2017.

        • Ioannidis J.P.
        Why most published research findings are false.
        PLoS Med. 2005; 2: e124
        • Sun G.W.
        • Shook T.L.
        • Kay G.L.
        Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis.
        J Clin Epidemiol. 1996; 49: 907-916
        • Ioannidis J.P.
        Why most discovered true associations are inflated.
        Epidemiology. 2008; 19: 640-648
        • Chatfield C.
        Model uncertainty, data mining and statistical inference.
        J R Stat Soc Ser A. 1995; 158: 419-466
        • Steyerberg E.W.
        • Eijkemans M.J.
        • Habbema J.D.
        Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis.
        J Clin Epidemiol. 1999; 52: 935-942
        • Steyerberg E.W.
        • Eijkemans M.J.
        • Harrell Jr., F.E.
        • Habbema J.D.
        Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.
        Stat Med. 2000; 19: 1059-1079
        • Royston P.
        • Altman D.G.
        • Sauerbrei W.
        Dichotomizing continuous predictors in multiple regression: a bad idea.
        Stat Med. 2006; 25: 127-141
        • Collins G.S.
        • Ogundimu E.O.
        • Cook J.A.
        • Manach Y.L.
        • Altman D.G.
        Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model.
        Stat Med. 2016; 35: 4124-4135
        • Babyak M.A.
        What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models.
        Psychosom Med. 2004; 66: 411-421
        • van der Ploeg T.
        • Austin P.C.
        • Steyerberg E.W.
        Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.
        BMC Med Res Methodol. 2014; 14: 137
        • Dekker F.W.
        • Ramspek C.L.
        • van Diepen M.
        Con: most clinical risk scores are useless.
        Nephrol Dial Transplant. 2017; 32: 752-755
        • Steyerberg E.W.
        • Balmana J.
        • Stockwell D.H.
        • Syngal S.
        Data reduction for prediction: robust coding of age and family history for the risk of having a genetic mutation.
        Stat Med. 2007; 26: 5545-5556
        • Harrell F.E.
        Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis.
        Springer, New York2015
        • Sauerbrei W.
        • Royston P.
        • Binder H.
        Selection of important variables and determination of functional form for continuous predictors in multivariable model building.
        Stat Med. 2007; 26: 5512-5528
        • Steyerberg E.W.
        • Harrell Jr., F.E.
        Prediction models need appropriate internal, internal-external, and external validation.
        J Clin Epidemiol. 2016; 69: 245-247
        • Harrell Jr., F.E.
        • Lee K.L.
        • Mark D.B.
        Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
        Stat Med. 1996; 15: 361-387
        • Steyerberg E.W.
        • Vickers A.J.
        • Cook N.R.
        • Gerds T.
        • Gonen M.
        • Obuchowski N.
        • et al.
        Assessing the performance of prediction models: a framework for traditional and novel measures.
        Epidemiology. 2010; 21: 128-138
        • Van Calster B.
        • Nieboer D.
        • Vergouwe Y.
        • De Cock B.
        • Pencina M.J.
        • Steyerberg E.W.
        A calibration hierarchy for risk models was defined: from utopia to empirical data.
        J Clin Epidemiol. 2016; 74: 167-176
        • Steyerberg E.W.
        • Vergouwe Y.
        Towards better clinical prediction models: seven steps for development and an ABCD for validation.
        Eur Heart J. 2014; 35: 1925-1931
        • Pavlou M.
        • Ambler G.
        • Seaman S.
        • De Iorio M.
        • Omar R.Z.
        Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events.
        Stat Med. 2016; 35: 1159-1177
        • Rahman M.S.
        • Sultana M.
        Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data.
        BMC Med Res Methodol. 2017; 17: 33
        • Altman D.G.
        • Andersen P.K.
        Bootstrap investigation of the stability of a Cox regression model.
        Stat Med. 1989; 8: 771-783
        • Derksen S.
        • Keselman H.J.
        Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables.
        Br J Math Stat Psychol. 1992; 45: 265-282
        • Austin P.C.
        • Tu J.V.
        Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality.
        J Clin Epidemiol. 2004; 57: 1138-1146
        • Wasserstein R.L.
        • Lazar N.A.
        The ASA's statement on p-values: context, process, and purpose.
        Am Stat. 2016; 70: 129-133
        • Greenland S.
        • Senn S.J.
        • Rothman K.J.
        • Carlin J.B.
        • Poole C.
        • Goodman S.N.
        • et al.
        Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.
        Eur J Epidemiol. 2016; 31: 337-350
        • Stang A.
        • Deckert M.
        • Poole C.
        • Rothman K.J.
        Statistical inference in abstracts of major medical and epidemiology journals 1975-2014: a systematic review.
        Eur J Epidemiol. 2017; 32: 21-29
        • Irwin J.R.
        • McClelland G.H.
        Negative consequences of dichotomizing continuous predictor variables.
        J Marketing Res. 2003; 40: 366-371
        • Altman D.G.
        • Royston P.
        The cost of dichotomising continuous variables.
        BMJ. 2006; 332: 1080
        • Naggara O.
        • Raymond J.
        • Guilbert F.
        • Roy D.
        • Weill A.
        • Altman D.G.
        Analysis by categorizing or dichotomizing continuous variables is inadvisable: an example from the natural history of unruptured aneurysms.
        AJNR Am J Neuroradiol. 2011; 32: 437-440
        • Dawson N.V.
        • Weiss R.
        Dichotomizing continuous variables in statistical analysis.
        Med Decis Making. 2012; 32: 225-226
        • Wynants L.
        • Timmerman D.
        • Verbakel J.Y.
        • Testa A.
        • Savelli L.
        • Fischerova D.
        • et al.
        Clinical utility of risk models to refer patients with adnexal masses to specialized oncology care: multicenter external validation using decision curve analysis.
        Clin Cancer Res. 2017; 23: 5082-5090
        • Steyerberg E.W.
        • Eijkemans M.J.
        • Harrell Jr., F.E.
        • Habbema J.D.
        Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets.
        Med Decis Making. 2001; 21: 45-56
        • Ye J.
        On measuring and correcting the effects of data mining and model selection.
        J Am Stat Assoc. 1998; 93: 120-131
        • Hastie T.
        • Tibshirani R.
        • Friedman J.H.
        The elements of statistical learning: data mining, inference, and prediction.
        Springer, New York2001
        • Van Belle V.
        • Van Calster B.
        Visualizing risk prediction models.
        PLoS One. 2015; 10: e0132614
        • Lee K.L.
        • Woodlief L.H.
        • Topol E.J.
        • Weaver W.D.
        • Betriu A.
        • Col J.
        • et al.
        Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction. Results from an international trial of 41,021 patients. GUSTO-I Investigators.
        Circulation. 1995; 91: 1659-1668
        • Steyerberg E.W.
        • Bleeker S.E.
        • Moll H.A.
        • Grobbee D.E.
        • Moons K.G.
        Internal and external validation of predictive models: a simulation study of bias and precision in small samples.
        J Clin Epidemiol. 2003; 56: 441-447
        • Copas J.B.
        Regression, prediction and shrinkage.
        J R Stat Soc Ser B. 1983; 45: 311-354
        • Tibshirani R.
        Regression and shrinkage via the Lasso.
        J R Stat Soc Ser B. 1996; 58: 267-288
        • Greenland S.
        Bayesian perspectives for epidemiological research. II. Regression analysis.
        Int J Epidemiol. 2007; 36: 195-202
        • Moons K.G.
        • Donders A.R.
        • Steyerberg E.W.
        • Harrell F.E.
        Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example.
        J Clin Epidemiol. 2004; 57: 1262-1270
        • Riley R.D.
        • Ensor J.
        • Snell K.I.
        • Debray T.P.
        • Altman D.G.
        • Moons K.G.
        • et al.
        External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges.
        BMJ. 2016; 353: i3140
        • Debray T.P.
        • Vergouwe Y.
        • Koffijberg H.
        • Nieboer D.
        • Steyerberg E.W.
        • Moons K.G.
        A new framework to enhance the interpretation of external validation studies of clinical prediction models.
        J Clin Epidemiol. 2015; 68: 279-289
        • Debray T.P.
        • Moons K.G.
        • Ahmed I.
        • Koffijberg H.
        • Riley R.D.
        A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis.
        Stat Med. 2013; 32: 3158-3180
        • Damen J.A.
        • Hooft L.
        • Schuit E.
        • Debray T.P.
        • Collins G.S.
        • Tzoulaki I.
        • et al.
        Prediction models for cardiovascular disease risk in the general population: systematic review.
        BMJ. 2016; 353: i2416
        • Reilly B.M.
        • Evans A.T.
        Translating clinical research into clinical practice: impact of using prediction rules to make decisions.
        Ann Intern Med. 2006; 144: 201-209
        • Ioannidis J.P.
        How to make more published research true.
        PLoS Med. 2014; 11: e1001747
        • Knottnerus J.A.
        • Muris J.W.
        Assessment of the accuracy of diagnostic tests: the cross-sectional study.
        J Clin Epidemiol. 2003; 56: 1118-1128
        • Van Calster B.
        • Vickers A.J.
        Calibration of risk prediction models: impact on decision-analytic performance.
        Med Decis Making. 2015; 35: 162-169