Original Article| Volume 60, ISSUE 3, P250-255, March 2007

A cluster-adjusted sample size algorithm for proportions was developed using a beta-binomial model

  • G.T. Fosgate
    Corresponding author. Tel.: 979-845-3203; fax: 979-847-8981.
    Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843-4458, USA
    Search for articles by this author
Published:September 29, 2006DOI:



      The objective of the paper was to design a computer algorithm to calculate sample sizes for estimating proportions incorporating clustered sampling units using a beta-binomial model when information concerning the intraclass correlation is not available.

      Study Design and Setting

      A computer algorithm was written in FORTRAN and evaluated for a hypothetical sample size situation.


      The developed algorithm was able to incorporate clustering in estimated sample sizes through the specification of a beta distribution to account for within-cluster correlation. In a hypothetical example, the usual normal approximation method for estimation of a proportion ignoring the clustered sampling design resulted in a calculated sample size of 107, whereas the developed algorithm suggested that 208 sampling units would be necessary.


      It is important to incorporate cluster adjustment in sample size calculations when designing epidemiologic studies for estimation of disease burden and other population proportions in the situation of correlated data even when information concerning the intraclass correlation is not available. Beta-binomial models can be used to account for clustering, and design effects can be estimated by generating beta distributions that encompass within-cluster correlation.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Campbell M.K.
        • Mollison J.
        • Grimshaw J.M.
        Cluster trials in implementation research: estimation of intracluster correlation coefficients and sample size.
        Stat Med. 2001; 20: 391-399
        • Ukoumunne O.C.
        A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials.
        Stat Med. 2002; 21: 3757-3774
        • Killip S.
        • Mahfoud Z.
        • Pearce K.
        What is an intracluster correlation coefficient? Crucial concepts for primary care researchers.
        Ann Fam Med. 2005; 2: 204-208
        • Bohning D.
        • Greiner M.
        Prevalence estimation under heterogeneity in the example of bovine trypanosomosis in Uganda.
        Prev Vet Med. 1998; 36: 11-23
        • Fleiss J.L.
        • Levin B.A.
        • Paik M.C.
        Statistical methods for rates and proportions.
        3rd ed. John Wiley, Hoboken, NY2003 (p. 213)
        • McDermott J.J.
        • Schukken Y.H.
        • Shoukri M.M.
        Study design and analytic methods for data collected from clusters of animals.
        Prev Vet Med. 1994; 18: 175-191
        • Donner A.
        • Donald A.
        The statistical analysis of multiple binary measurements.
        J Clin Epidemiol. 1988; 41: 899-905
        • Ridout M.S.
        • Demetrio C.G.
        • Firth D.
        Estimating intraclass correlation for binary data.
        Biometrics. 1999; 55: 137-148
        • Daniel W.W.
        Biostatistics: a foundation for analysis in the health sciences.
        7th ed. Wiley, New York1999 (p. 156)
        • Aitken C.G.
        Sampling—how big a sample?.
        J Forensic Sci. 1999; 44: 750-760
        • Agresti A.
        Categorical data analysis.
        2nd ed. Wiley-Interscience, New York2002 (p. 606)
        • Suess E.A.
        • Gardner I.A.
        • Johnson W.O.
        Hierarchical Bayesian model for prevalence inferences and determination of a country's status for an animal pathogen.
        Prev Vet Med. 2002; 55: 155-171
      1. @Risk, version 4.5.2. Ithaca, NY: Palisade Corporation; 2002.

        • International Business Machines Corporation
        Programming Research Group. Preliminary report: specifications for the IBM Mathematical FORmula TRANslating System, FORTRAN.
        The Corporation, New York1954
        • Fosgate G.T.
        Modified exact sample size for a binomial proportion with special emphasis on diagnostic test parameter estimation.
        Stat Med. 2005; 24: 2857-2866
      2. Compaq visual Fortran: professional edition, version 6.6. Palo Alto, CA: Hewlett-Packard Company; 2000.

        • Thrusfield M.
        Veterinary epidemiology.
        3rd ed. Blackwell Publishing, Ames, IA2005 (pp. 232–3)
        • McKibben L.
        • Horan T.
        • Tokars J.I.
        • Fowler G.
        • Cardo D.M.
        • Pearson M.L.
        • et al.
        Guidance on public reporting of healthcare-associated infections: recommendations of the healthcare infection control advisory committee.
        Am J Infect Control. 2005; 33: 217-226
        • Weinstein R.A.
        Nosocomial infection update.
        Emerg Infect Dis. 1998; 4: 416-420
        • Exner M.
        • Kramer A.
        • Lajoie L.
        • Gebel J.
        • Engelhart S.
        • Hartemann P.
        Prevention and control of health care-associated waterborne infections in health care facilities.
        Am J Infect Control. 2005; 33: S26-S40
        • Branscum A.J.
        • Gardner I.A.
        • Johnson W.O.
        Estimation of diagnostic-test sensitivity and specificity through Bayesian modeling.
        Prev Vet Med. 2005; 68: 145-163