Advertisement
Original Article| Volume 79, P112-119, November 2016

Simple and multiple linear regression: sample size considerations

  • James A. Hanley
    Correspondence
    Corresponding author. Tel.: +1 514 398 6270; fax: +1 514 398 4503.
    Affiliations
    Department of Epidemiology, Biostatistics and Occupational Health, McGill University, 1020 Pine Avenue West, Montreal, Quebec H3A 1A2, Canada
    Search for articles by this author

      Abstract

      Objective

      The suggested “two subjects per variable” (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression.

      Study Design and Setting

      This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing “exposure” (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or “profiles.” It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates.

      Results and Conclusion

      By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres.

      Keywords

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Austin P.C.
        • Steyerberg E.W.
        The number of subjects per variable required in linear regression analyses.
        J Clin Epidemiol. 2015; 68: 627-636
        • Steyerberg E.W.
        Clinical prediction models. A practical approach to development, validating and updating.
        Springer-Verlag, New York, NY2009
        • Andropoulos D.B.
        • Bent S.T.
        • Skjonsby B.
        • Stayer S.A.
        The optimal length of insertion of central venous catheters for pediatric patients.
        Anesth Analg. 2001; 93: 883-886
        • Szeto C.
        • Kost K.
        • Hanley J.A.
        • Roy A.
        • Christou N.
        A simple method to predict pretracheal tissue thickness to prevent accidental decannulation in the obese.
        Otolaryngol Head Neck Surg. 2010; 143: 223-229
      1. Old Faithful Geyser Streaming Webcam, 2016. Available at http://www.nps.gov/features/yell/webcam/oldFaithfulStreaming.html. Accessed July 22, 2016.

      2. National Park Service. NPS Yellowstone Geysers: app for smart phones, 2016. Available at https://www.nps.gov/yell/learn/news/14089.htm. Accessed July 22, 2016.

        • Hanley J.A.
        • Saarela O.
        • Stephens D.A.
        • Thalabard J.C.
        hGH isoform differential immunoassays applied to blood samples from athletes: decision limits for anti-doping testing.
        Growth Horm IGF Res. 2014; 24: 205-215
        • Harrell F.E.
        Regression modeling strategies. With applications to linear models, logistic and ordinal regression, and survival analysis.
        2nd ed. Springer; ​Cham, Cham2015
        • Pearson K.
        • Lee A.
        On the laws of inheritance in man: I. inheritance of physical characteristics.
        Biometrika. 1903; 2: 357-462
        • Weisberg S.
        Applied linear regression.
        Wiley, New York1980
      3. Healthy calculators. Available at http://www.healthycalculators.com/childrens-height-predictor.php. Accessed July 22, 2016.

        • Vittinghoff E.
        • McCulloch C.E.
        Relaxing the rule of ten events per variable in logistic and Cox regression.
        Am J Epidemiol. 2007; 165 (Epub 2006 Dec 20): 710-718
        • Hanley J.A.
        • Moodie E.E.M.
        Sample size, precision and power calculations: a unified approach.
        J Biomet Biostat. 2011; 5: 2
        • Marill K.A.
        Advanced statistics: linear regression, part II: multiple linear regression.
        Acad Emerg Med. 2004; 11: 94-102
        • Hocking R.R.
        • Pendleton O.J.
        The regression dilemma.
        Commun Stat Theory Methods. 1983; 12: 497-527
        • Hocking R.R.
        Methods and applications of linear models: regression and the analysis of variance.
        3rd ed. John Wiley & Sons, Hoboken, New Jersey2013
        • Cook R.D.
        • Weisberg S.
        Applied regression including computing and graphics.
        Wiley, New York1999