Advertisement

Importance of events per independent variable in proportional hazards regression analysis II. Accuracy and precision of regression estimates

  • Peter Peduzzi
    Affiliations
    Cooperative Studies Program Coordinating Center, West Haven Veterans Affairs Medical Center, West Haven, Connecticut, USA

    Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, USA
    Search for articles by this author
  • John Concato
    Correspondence
    Address correspondence and reprint requests to: John Concato, M.D., M.S., M.P.H., Medical Service/111GIM, West Haven VAMC, 950 Campbell Ave., West Haven, CT 06516.
    Affiliations
    Medical Service, West Haven Veterans Affairs Medical Center, West Haven, Connecticut, USA

    Department of Medicine (Clinical Epidemiology Unit), Yale University School of Medicine, New Haven, Connecticut, USA
    Search for articles by this author
  • Alvan R. Feinstein
    Affiliations
    Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, USA

    Department of Medicine (Clinical Epidemiology Unit), Yale University School of Medicine, New Haven, Connecticut, USA
    Search for articles by this author
  • Theodore R. Holford
    Affiliations
    Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, USA
    Search for articles by this author
      This paper is only available as a PDF. To read, Please Download here.

      Abstract

      The analytical effect of the number of events per variable (EPV) in a proportional hazards regression analysis was evaluated using Monte Carlo simulation techniques for data from a randomized trial containing 673 patients and 252 deaths, in which seven predictor variables had an original significance level of p < 0.10. The 252 deaths and 7 variables correspond to 36 events per variable analyzed in the full data set.
      Five hundred simulated analyses were conducted for these seven variables at EPVs of 2, 5, 10, 15, 20, and 25. For each simulation, a random exponential survival time was generated for each of the 673 patients, and the simulated results were compared with their original counterparts. As EPV decreased, the regression coefficients became more biased relative to the true value; the 90% confidence limits about the simulated values did not have a coverage of 90% for the original value; large sample properties did not hold for variance estimates from the proportional hazards model, and the Z statistics used to test the significance of the regression coefficients lost validity under the null hypothesis.
      Although a single boundary level for avoiding problems is not easy to choose, the value of EPV = 10 seems most prudent. Below this value for EPV, the results of proportional hazards regression analyses should be interpreted with caution because the statistical model may not be valid.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Concato J
        • Feinstein AR
        • Holford TR
        The risk of determining risk with multivariable models.
        Ann Intern Med. 1993; 118: 201-210
        • Harrell FE
        • Lee KL
        • Matchar DB
        • Reichert TA
        Regression models for prognostic prediction: Advantages, problems, and suggested solutions.
        Cancer Treat Rep. 1985; 69: 1071-1077
        • Stephens MA
        EDF statistics for goodness of fit and some comparisons.
        J Am Stat Assoc. 1974; 69: 730-737
        • Tsiatis A
        A large sample study of Cox's regression model.
        Ann Stat. 1981; 9: 93-108