Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review

  • M. Sanni Ali
    Affiliations
    Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands

    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
    Search for articles by this author
  • Rolf H.H. Groenwold
    Affiliations
    Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands

    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
    Search for articles by this author
  • Svetlana V. Belitser
    Affiliations
    Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands
    Search for articles by this author
  • Wiebe R. Pestman
    Affiliations
    Catholic University of Leuven, Research unit for Quantitative Psychology and Individual Differences, Leuven, Belgium
    Search for articles by this author
  • Arno W. Hoes
    Affiliations
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
    Search for articles by this author
  • Kit C.B. Roes
    Affiliations
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
    Search for articles by this author
  • Anthonius de Boer
    Affiliations
    Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands
    Search for articles by this author
  • Olaf H. Klungel
    Correspondence
    Corresponding author. Tel.: +31 6 288 31 313; fax: +31 30 253 91 66.
    Affiliations
    Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands

    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
    Search for articles by this author
Published:November 26, 2014DOI:https://doi.org/10.1016/j.jclinepi.2014.08.011

      Abstract

      Objectives

      To assess the current practice of propensity score (PS) analysis in the medical literature, particularly the assessment and reporting of balance on confounders.

      Study Design and Setting

      A PubMed search identified studies using PS methods from December 2011 through May 2012. For each article included in the review, information was extracted on important aspects of the PS such as the type of PS method used, variable selection for PS model, and assessment of balance.

      Results

      Among 296 articles that were included in the review, variable selection for PS model was explicitly reported in 102 studies (34.4%). Covariate balance was checked and reported in 177 studies (59.8%). P-values were the most commonly used statistical tools to report balance (125 of 177, 70.6%). The standardized difference and graphical displays were reported in 45 (25.4%) and 11 (6.2%) articles, respectively. Matching on the PS was the most commonly used approach to control for confounding (68.9%), followed by PS adjustment (20.9%), PS stratification (13.9%), and inverse probability of treatment weighting (IPTW, 7.1%). Balance was more often checked in articles using PS matching and IPTW, 70.6% and 71.4%, respectively.

      Conclusion

      The execution and reporting of covariate selection and assessment of balance is far from optimal. Recommendations on reporting of PS analysis are provided to allow better appraisal of the validity of PS-based studies.

      Keywords

      To read this article in full you will need to make a payment

      References

        • Rosenbaum P.R.
        • Rubin D.B.
        The central role of the propensity score in observational studies for causal effects.
        Biometrika. 1983; 70: 41-55
        • Rosenbaum P.R.
        • Rubin D.B.
        Reducing bias in observational studies using subclassification on the propensity score.
        J Am Stat Assoc. 1984; 79: 516-524
        • Brookhart M.A.
        • Schneeweiss S.
        • Rothman K.J.
        • Glynn R.J.
        • Avorn J.
        • Stürmer T.
        Variable selection for propensity score models.
        Am J Epidemiol. 2006; 163: 1149
        • Patrick A.R.
        • Schneeweiss S.
        • Brookhart M.A.
        • Glynn R.J.
        • Rothman K.J.
        • Avorn J.
        • et al.
        The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration.
        Pharmacoepidemiol Drug Saf. 2011; 20: 551-559
        • Hill J.
        Discussion of research using propensity-score matching: comments on ‘a critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine.
        Stat Med. 2008; 27: 2055-2061
        • Lunt M.
        Selecting an appropriate caliper can be essential for achieving good balance with propensity score matching.
        Am J Epidemiol. 2014; 179: 226-235
        • Ali M.S.
        • Groenwold R.H.
        • Klungel O.H.
        Propensity score methods and unobserved covariate imbalance: comments on “squeezing the balloon”.
        Health Serv Res. 2014; 49: 1074-1082
        • Austin P.C.
        A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003.
        Stat Med. 2008; 27: 2037-2049
        • Weitzen S.
        • Lapane K.L.
        • Toledano A.Y.
        • Hume A.L.
        • Mor V.
        Principles for modeling propensity scores in medical research: a systematic literature review.
        Pharmacoepidemiol Drug Saf. 2004; 13: 841-853
        • Shah B.R.
        • Laupacis A.
        • Hux J.E.
        • Austin P.C.
        Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review.
        J Clin Epidemiol. 2005; 58: 550-559
        • Austin P.C.
        Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.
        J Thorac Cardiovasc Surg. 2007; 134: 1128-1135
        • D'ascenzo F.
        • Cavallero E.
        • Biondi-Zoccai G.
        • Moretti C.
        • Omedè P.
        • Bollati M.
        • et al.
        Use and misuse of multivariable approaches in interventional cardiology studies on drug-eluting stents: a systematic review.
        J Interv Cardiol. 2012; 25: 611-621
        • Brookhart M.A.
        • Stürmer T.
        • Glynn R.J.
        • Rassen J.
        • Schneeweiss S.
        Confounding control in healthcare database research: challenges and potential approaches.
        Med Care. 2010; 48: S114
      1. Pearl J. On a class of bias-amplifying variables that endanger effect estimates. In: Grünwald P, Spirtes P, Eds. Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI 2010). 2010; Corvallis, OR: Association for Uncertainty in Artificial Intelligence; 201: 425–432.

        • Myers J.A.
        • Rassen J.A.
        • Gagne J.J.
        • Huybrechts K.F.
        • Schneeweiss S.
        • Rothman K.J.
        • et al.
        Effects of adjusting for instrumental variables on bias and precision of effect estimates.
        Am J Epidemiol. 2011; 174: 1213-1222
        • Pearl J.
        Invited commentary: understanding bias amplification.
        Am J Epidemiol. 2011; 174: 1223-1227
        • Austin P.C.
        Goodness-of-fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score.
        Pharmacoepidemiol Drug Saf. 2008; 17: 1202-1217
        • Austin P.C.
        Assessing balance in measured baseline covariates when using many-to-one matching on the propensity-score.
        Pharmacoepidemiol Drug Saf. 2008; 17: 1218-1225
        • Belitser S.V.
        • Martens E.P.
        • Pestman W.R.
        • Groenwold R.H.H.
        • Boer A.
        • Klungel O.H.
        Measuring balance and model selection in propensity score methods.
        Pharmacoepidemiol Drug Saf. 2011; 20: 1115-1129
        • Groenwold R.H.H.
        • Vries F.
        • Boer A.
        • Pestman W.R.
        • Rutten F.H.
        • Hoes A.W.
        • et al.
        Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction.
        Pharmacoepidemiol Drug Saf. 2011; 20: 1130-1137
        • Ali M.S.
        • Groenwold R.H.
        • Pestman W.R.
        • Belitser S.V.
        • Roes K.C.
        • Hoes A.W.
        • et al.
        Propensity score balance measures in pharmacoepidemiology: a simulation study.
        Pharmacoepidemiol Drug Saf. 2014; 23: 802-811
        • Weitzen S.
        • Lapane K.L.
        • Toledano A.Y.
        • Hume A.L.
        • Mor V.
        Weaknesses of goodness-of-fit tests for evaluating propensity score models: the case of the omitted confounder.
        Pharmacoepidemiol Drug Saf. 2005; 14: 227-238
        • Austin P.C.
        Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples.
        Stat Med. 2009; 28: 3083-3107
        • Westreich D.
        • Cole S.R.
        • Funk M.J.
        • Brookhart M.A.
        • Stürmer T.
        The role of the c-statistic in variable selection for propensity score models.
        Pharmacoepidemiol Drug Saf. 2011; 20: 317-320
        • Falagas M.E.
        • Kouranos V.D.
        • Arencibia-Jorge R.
        • Karageorgopoulos D.E.
        Comparison of SCImago journal rank indicator with journal impact factor.
        FASEB J. 2008; 22: 2623-2628
        • Gonzalez-Pereira B.
        • Guerrero-Bote V.P.
        • Moya-Anegón F.
        A new approach to the metric of journals' scientific prestige: the SJR indicator.
        J Informetr. 2010; 4: 379-391
        • Bornmann L.
        • Marx W.
        • Gasparyan A.Y.
        • Kitas G.D.
        Diversity, value and limitations of the journal impact factor and alternative metrics.
        Rheumatol Int. 2012; 32: 1861-1867
        • Ho D.E.
        • Imai K.
        • King G.
        • Stuart E.A.
        Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference.
        Polit Anal. 2007; 15: 199-236
        • Ash A.
        • Shwartz M.
        R2: a useful measure of model performance when predicting a dichotomous outcome.
        Stat Med. 1999; 18: 375-384
        • Hanley J.A.
        • McNeil B.J.
        A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
        Radiology. 1983; 148: 839-843
        • Hosmer Jr., D.W.
        • Lemeshow S.
        Applied logistic regression.
        2nd ed. John Wiley & Sons, New York, NY2004
        • Drake C.
        Effects of misspecification of the propensity score on estimators of treatment effect.
        Biometrics. 1993; 49: 1231-1236
        • Silverman B.W.
        Density estimation for statistics and data analysis.
        Chapman & Hall/CRC, London, UK1986
        • Stephens M.A.
        Use of the Kolmogorov-Smirnov, Cramér-Von Mises and related statistics without extensive tables.
        J R Stat Soc Ser B Meth. 1970; 32: 115-122
        • Pestman W.R.
        Mathematical statistics: an introduction.
        2nd ed. Walter De Gruyter Inc, Berlin, Germany1998
        • Fleiss J.L.
        • Levin B.
        • Paik M.C.
        Statistical methods for rates and proportions.
        2nd ed. John Wiley & Sons, Hoboken, NJ2013
        • Hartung J.
        • Knapp G.
        Statistical inference in adaptive group sequential trials with the standardized mean difference as effect size.
        Sequential Anal. 2011; 30: 94-113
        • Cohen J.
        Statistical power analysis for the behavioral sciences.
        2nd ed. Lawrence Erlbaum Associates Publishers, Hillsdale, NJ1988
        • Martens E.P.
        • Pestman W.R.
        • De Boer A.
        • Belitser S.V.
        • Klungel O.H.
        Systematic differences in treatment effect estimates between propensity score methods and logistic regression.
        Int J Epidemiol. 2008; 37: 1142-1147
        • Glynn R.J.
        • Schneeweiss S.
        • Stürmer T.
        Indications for propensity scores and review of their use in pharmacoepidemiology.
        Basic Clin Pharmacol Toxicol. 2006; 98: 253-259
        • Myers J.A.
        • Rassen J.A.
        • Gagne J.J.
        • Huybrechts K.F.
        • Schneeweiss S.
        • Rothman K.J.
        • et al.
        Myers et al. Respond to “understanding bias amplification”.
        Am J Epidemiol. 2011; 174: 1228-1229
        • Stuart E.A.
        Developing practical recommendations for the use of propensity scores: discussion of ‘a critical appraisal of propensity score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine.
        Stat Med. 2008; 27: 2062-2065
        • von Elm E.
        • Altman D.G.
        • Egger M.
        • Pocock S.J.
        • Gøtzsche P.C.
        • Vandenbroucke J.P.
        The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies.
        Prev Med. 2007; 45: 247-251
      2. ENCePP Guide on Methodological Standards in Pharmacoepidemiology. EMA/95098/2010. Available at www.encepp.eu/standards_and_guidances. Accessed June 22, 2013

        • Mortimer K.M.
        • Neugebauer R.
        • Van Der Laan M.
        • Tager I.B.
        An application of model-fitting procedures for marginal structural models.
        Am J Epidemiol. 2005; 162: 382-388
        • Westreich D.
        • Lessler J.
        • Funk M.J.
        Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.
        J Clin Epidemiol. 2010; 63: 826-833
        • Lee B.K.
        • Lessler J.
        • Stuart E.A.
        Weight trimming and propensity score weighting.
        PLoS One. 2011; 6: e18174
        • Setoguchi S.
        • Schneeweiss S.
        • Brookhart M.A.
        • Glynn R.J.
        • Cook E.F.
        Evaluating uses of data mining techniques in propensity score estimation: a simulation study.
        Pharmacoepidemiol Drug Saf. 2008; 17: 546-555
        • Ali M.S.
        • Groenwold R.H.
        • Pestman W.R.
        • Belitser S.V.
        • Hoes A.W.
        • de Boer A.
        • et al.
        Time-dependent propensity score and collider-stratification bias: an example of beta2-agonist use and the risk of coronary heart disease.
        Eur J Epidemiol. 2013; 28: 291-299
        • Imai K.
        • King G.
        • Stuart E.A.
        Misunderstandings between experimentalists and observationalists about causal inference.
        J R Stat Soc Ser A Stat Soc. 2008; 171: 481-502
        • Franklin J.M.
        • Rassen J.A.
        • Ackermann D.
        • Bartels D.B.
        • Schneeweiss S.
        Metrics for covariate balance in cohort studies of causal effects.
        Stat Med. 2014; 33: 1685-1699
        • Hernán M.Á.
        • Brumback B.
        • Robins J.M.
        Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men.
        Epidemiology. 2000; 11: 561-570
        • Robins J.M.
        • Hernán M.Á.
        • Brumback B.
        Marginal structural models and causal inference in epidemiology.
        Epidemiology. 2000; 11: 550-560
        • Hernán M.A.
        • Hernández-Díaz S.
        • Robins J.M.
        A structural approach to selection bias.
        Epidemiology. 2004; 15: 615-625
        • Kurth T.
        • Walker A.M.
        • Glynn R.J.
        • Chan K.A.
        • Gaziano J.M.
        • Berger K.
        • et al.
        Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect.
        Am J Epidemiol. 2006; 163: 262-270
        • Greenland S.
        • Pearl J.
        Adjustments and their consequences–collapsibility analysis using graphical models.
        Int Stat Rev. 2010; 79: 401-426
        • Austin P.C.
        • Grootendorst P.
        • Anderson G.M.
        A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study.
        Stat Med. 2007; 26: 734-753
        • Rassen J.A.
        • Shelat A.A.
        • Myers J.
        • Glynn R.J.
        • Rothman K.J.
        • Schneeweiss S.
        One-to-many propensity score matching in cohort studies.
        Pharmacoepidemiol Drug Saf. 2012; 21: 69-80
        • Austin P.C.
        Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies.
        Pharm Stat. 2011; 10: 150-161
        • Austin P.C.
        A comparison of 12 algorithms for matching on the propensity score.
        Stat Med. 2013; 33: 1057-1069