Journal of Clinical Epidemiology
Volume 57, Issue 11 , Pages 1138-1146, November 2004

Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality

  • Peter C. Austin

      Affiliations

    • Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada
    • Department of Public Health Sciences, University of Toronto, McMurrich Bldg, 4th Floor, 12 Queen's Park Crescent West, Toronto, Ontario, M5S 1A8 Canada
    • Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, McMurrich Bldg, 2nd Floor, 12 Queen's Park Crescent West, Ontario, M5S 1A8 Canada
    • Corresponding Author InformationCorresponding author. Tel.: 416-480-6131; fax: 416-480-6048.
  • ,
  • Jack V. Tu

      Affiliations

    • Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada
    • Department of Public Health Sciences, University of Toronto, McMurrich Bldg, 4th Floor, 12 Queen's Park Crescent West, Toronto, Ontario, M5S 1A8 Canada
    • Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, McMurrich Bldg, 2nd Floor, 12 Queen's Park Crescent West, Ontario, M5S 1A8 Canada
    • Clinical Epidemiology and Health Care Research Program, Sunnybrook & Women's College Health Science Centre, 2075 Bayview Ave, Toronto, Ontario, M4N 3M5 Canada
    • Division of General Internal Medicine, Sunnybrook & Women's College Health Sciences Centre, 2075 Bayview Ave, Toronto, Ontario, M4N 3M5 Canada

Accepted 14 April 2004.

Abstract 

Objectives

Automated variable selection methods are frequently used to determine the independent predictors of an outcome. The objective of this study was to determine the reproducibility of logistic regression models developed using automated variable selection methods.

Study design and setting

An initial set of 29 candidate variables were considered for predicting mortality after acute myocardial infarction (AMI). We drew 1,000 bootstrap samples from a dataset consisting of 4,911 patients admitted to hospital with an AMI. Using each bootstrap sample, logistic regression models predicting 30-day mortality were obtained using backward elimination, forward selection, and stepwise selection. The agreement between the different model selection methods and the agreement across the 1,000 bootstrap samples were compared.

Results

Using 1,000 bootstrap samples, backward elimination identified 940 unique models for predicting mortality. Similar results were obtained for forward and stepwise selection. Three variables were identified as independent predictors of mortality among all bootstrap samples. Over half the candidate prognostic variables were identified as independent predictors in less than half of the bootstrap samples.

Conclusion

Automated variable selection methods result in models that are unstable and not reproducible. The variables selected as independent predictors are sensitive to random fluctuations in the data.

Keywords: Regression models, Multivariate analysis, Variable selection, Logistic regression, Acute myocardial infarction, Epidemiology

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0895-4356(04)00111-8

doi:10.1016/j.jclinepi.2004.04.003

Journal of Clinical Epidemiology
Volume 57, Issue 11 , Pages 1138-1146, November 2004