Advertisement

A Gentle Introduction to Instrumental Variables

Open AccessPublished:July 06, 2022DOI:https://doi.org/10.1016/j.jclinepi.2022.06.022

      Abstract

      Instrumental variables (IV) is a central strategy for identifying causal effects in absence of randomized experiments. Clinicians and epidemiologists may find the intuition of IV easy to grasp by comparison to randomized experiments. Randomization is an ideal IV because treatment is assigned randomly, and hence unaffected by everything else. IV methods in nonexperimental settings mimic a randomized experiment by using a source of “as good as” random variation in treatment instead. The main challenge with IV designs is to find IVs that are as good as randomization. Discovering potential IVs require substantive knowledge and an understanding of design principles. Moreover, IV methods recover causal effects for a subset of the population who take treatment when induced by the IV. Sometimes these estimates are informative, other times their relevance is questionable. We provide an introduction to IV methods in clinical epidemiology. First, we introduce the main principles and assumptions. Second, we present practical examples based on Mendelian randomization and provider preference and refer to other common IVs in health. Third, practical steps in IV analysis are presented. Fourth, the promise and perils of IV methods are discussed. Finally, we suggest further readings.

      1. Mimicking a randomized experiment

      Clinicians and epidemiologists strive to improve health through interventions. Decisions on which treatments or policies to pursue require causal knowledge [
      • Westreich D.
      Epidemiology by Design: A Causal Approach to the Health Sciences.
      ]. Causal effects demands comparable treatment and control groups (i.e., exchangeability) [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ]. While comparable groups are expected by design in randomized controlled trials (RCT), observational studies are often challenged by confounding bias. As most observational methods can only adjust for measured confounders, ruling out unmeasured confounding is often unrealistic. A common issue in clinical epidemiology is confounding by indication where treatment decisions are based on potentially unmeasured patient characteristics such as disease severity [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ]. Figure 1A presents a data-generating process, a directed acyclic graph (DAG) [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ], for this scenario. Here the interest lies in the effect of a treatment, D, on an outcome, Y, but any causal interpretation is precluded by unmeasured patient characteristics, U. Instrumental variables (IV) are appealing as these methods can provide causal effects from observational data even with unmeasured confounding [
      • Huntington-Klein N.
      The Effect: An Introduction to Research Design and Causality.
      ].
      Figure thumbnail gr1
      Fig. 1Directed acyclic graphs for confounding bias and instrumental variables. (A) presents a DAG where the effect of D on Y is biased by unmeasured confounding, U. The total association between D and Y consists of a causal (DY) and confounding (DUY) component. As U is unmeasured, the confounding back-door (DUY) cannot be blocked by conditioning and the effect of D on Y remains confounded [
      • Elwert F.
      Graphical Causal Models.
      ]. (B) presents a DAG where Z is a valid IV for the effect of D on Y. Instead of aiming to close the confounding back-door through conditioning, Z isolates causal covariation in D and Y due to Z and ignores confounded covariation. By only using this causal covariation, IV analysis identify causal effects of D on Y for people whose value on D is determined by Z.
      Conceptually, an IV can be compared to randomization in RCTs. IV methods, like RCTs, depend on random variation in treatment for comparable groups. But in contrast to RCTs with investigator-led randomization, IV methods instead exploit a source of as good as random variation in treatment and are thus considered quasi-experimental designs [
      • Huntington-Klein N.
      The Effect: An Introduction to Research Design and Causality.
      ]. The main challenge is to find a credible IV which is as good as random in allocating people to treatment.
      In Figure 1B, Z is an IV to the effect of D on Y. IV relies on three main conditions. A valid IV, Z, must (i) predict treatment status (“relevance”), (ii) only affect Y through D (“exclusion”), and (iii) be as good as randomly assigned (“independence”). These conditions are met as there's a causal path ZD and no open paths between Z and Y except through D.
      The randomization indicator in a double-blind placebo-controlled trial is a valid IV that meets conditions (i)-(iii) by design. Randomization increases the probability of receiving treatment among people assigned to treatment, while exclusion and independence is expected by double-blindness and random assignment of Z [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ]. With observational data, however, researchers must combine creativeness, knowledge of the field, and design principles to find potential IVs.
      A fourth condition (iv) of identical or homogenous treatment effects is required to estimate the average treatment effect (ATE). In health settings, however, effects are often heterogenous (i.e., vary by people) which require an alternative fourth condition of monotonicity (i.e., Z only affects D in one direction). IV methods only exploit variation in treatment induced by the IV. Hence, under conditions (i)-(iii) and monotonicity, IV methods retrieve the local average treatment effect (LATE), which is the average treatment effect for people whose treatment was determined by the IV (“compliers”). Compliers are expected to consist of comparable treatment and control groups and thus link IV to the RCTs we mimic [
      • Angrist J.D.
      Empirical Strategies in Economics: Illuminating the Path from Cause to Effect.
      ].
      Only condition (i) is possible to empirically verify while conditions (ii)-(iv) must be assumed [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ]. Applications of IV methods therefore strongly depend on building a case for the validity of the IV design based on substantive knowledge, logic, and empirical justifications [
      • Huntington-Klein N.
      The Effect: An Introduction to Research Design and Causality.
      ].

      2. Proposed IVs

      An increasingly popular type of IV analysis is Mendelian randomization (MR) which uses genetic variation associated with a treatment of interest as an IV. MR applies the randomization of the genome at conception [
      • Pingault J.B.
      • O’Reilly P.F.
      • Schoeler T.
      • Ploubidis G.B.
      • Rijsdijk F.
      • Dudbridge F.
      Using genetic data to strengthen causal inference in observational research.
      ]. For example, MR has been used to estimate causal effects of alcohol consumption on cardiovascular health. RCTs are infeasible and conventional observational studies are hampered, e.g., by reverse causation due to lower alcohol consumption in people with poor health and confounding bias induced by other health and social characteristics.
      Specifically, the rs671-A allele in aldehyde dehydrogenase 2 (ALDH2) gene, involved in alcoholic metabolism and prevalent in Asian populations, has been used as an IV [
      • Cho Y.
      • Shin S.
      • Won S.
      • Relton C.L.
      • Davey Smith G.
      • Shin M.
      Alcohol intake and cardiovascular risk factors: a Mendelian randomisation study.
      ]. Carriers consume less alcohol on average compared to noncarriers, likely due to adverse reactions (e.g., nausea) [
      • Cho Y.
      • Shin S.
      • Won S.
      • Relton C.L.
      • Davey Smith G.
      • Shin M.
      Alcohol intake and cardiovascular risk factors: a Mendelian randomisation study.
      ]. Cho et al. [
      • Cho Y.
      • Shin S.
      • Won S.
      • Relton C.L.
      • Davey Smith G.
      • Shin M.
      Alcohol intake and cardiovascular risk factors: a Mendelian randomisation study.
      ] use this IV in a Korean sample; the proportion of variance in alcohol use caused by the allele variant (i.e., to which individuals are genetically randomized), is used to estimate the association between alcohol consumption and cardiovascular health. All other causes of alcohol use are by design excluded from the estimated effect. Results show that alcohol consumption increases risk of hypertension and blood pressure. IV methods here contribute to a topic where the evidence is mixed and causal knowledge is crucial.
      Several nongenetic IVs are proposed in health research. Provider preference IVs (PP IV) are increasingly used, too [
      • Widding-Havneraas T.
      • Chaulagain A.
      • Lyhmann I.
      • Zachrisson H.D.
      • Elwert F.
      • Markussen S.
      • et al.
      Preference-based instrumental variables in health research rely on important and underreported assumptions: a systematic review.
      ]. PP IV designs assume that variation in clinicians’ treatment preference for similar patients induces random variation in patients’ treatment status. PP IVs have been used for treatment effects in medical specialities such as cancer, cardiology, and psychiatry [
      • Widding-Havneraas T.
      • Chaulagain A.
      • Lyhmann I.
      • Zachrisson H.D.
      • Elwert F.
      • Markussen S.
      • et al.
      Preference-based instrumental variables in health research rely on important and underreported assumptions: a systematic review.
      ]. A considerable literature exist on methodological concerns in MR, PP IV, and other IVs in health [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ,
      • Pingault J.B.
      • O’Reilly P.F.
      • Schoeler T.
      • Ploubidis G.B.
      • Rijsdijk F.
      • Dudbridge F.
      Using genetic data to strengthen causal inference in observational research.
      ,
      • Glymour M.M.
      • Swanson S.A.
      Instrumental Variables and Quasi-Experimental Approaches.
      ]. For more proposed IVs, including distance to provider, day of hospital admission, and calendar time, see, e.g., Brookhart et al. [
      • Brookhart M.A.
      • Rassen J.A.
      • Schneeweiss S.
      Instrumental variable methods in comparative safety and effectiveness research.
      ] and Glymour and Swanson [
      • Glymour M.M.
      • Swanson S.A.
      Instrumental Variables and Quasi-Experimental Approaches.
      ].

      3. Practical steps

      Researchers should start by considering the estimand of interest and whether conditions (i)-(iv) are likely to hold [
      • Lundberg I.
      • Johnson R.
      • Stewart B.M.
      What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.
      ]. Second, data availability is key. IV usually requires large datasets. Several databases include genetic data suitable for MR analyses (see, e.g., overview in Davies et al. [
      • Davies N.M.
      • Holmes M.V.
      • Davey Smith G.
      Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians.
      ]). For nongenetic IV designs, large health surveys, cohort studies, or administrative data are well-suited. Third, preregistration of statistical analyses on platforms such as Open Science Framework can improve overall transparancy and credibility. Fourth, with data, relevance can be empirically verified while plausibility of exclusion, independence, and monotonicity can be assessed by combining substantive knowledge and falsification tests [
      • Labrecque J.
      • Swanson S.A.
      Understanding the Assumptions Underlying Instrumental Variable Analyses: a Brief Review of Falsification Strategies and Related Tools.
      ]. Finally, reporting guidelines makes it easier to evaluate validity and interpret findings for researchers, clinicians, and other readers [
      • Skrivankova V.W.
      • Richmond R.C.
      • Woolf B.A.
      • Yarmolinsky J.
      • Davies N.M.
      • Swanson S.A.
      • et al.
      Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomization: The STROBE-MR Statement.
      ]. Guidelines are developed for MR analyses [
      • Skrivankova V.W.
      • Richmond R.C.
      • Woolf B.A.
      • Yarmolinsky J.
      • Davies N.M.
      • Swanson S.A.
      • et al.
      Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomization: The STROBE-MR Statement.
      ], and Swanson & Hernan [
      • Swanson S.A.
      • Hernán M.A.
      Commentary: How to Report Instrumental Variable Analyses (Suggestions Welcome).
      ] and Brookhart et al. [
      • Brookhart M.A.
      • Rassen J.A.
      • Schneeweiss S.
      Instrumental variable methods in comparative safety and effectiveness research.
      ] are helpful for IV in general.

      4. Promise and perils

      Becoming familiarized with IV methods forces any researcher to explicitly consider risks associated with confounding bias in any nonexperimental study. Because IVs rely on observational data, as opposed to experiments, they may also have stronger external validity. Combining IV methods with existing data bases can give causal estimates for long-term outcomes whereas RCTs require time to pass. Yet IV methods are not a panacea for causal inference with observational data. Credible IVs are rare and the methodological literature vast. The main concern with IV designs is that the unverifiable assumption of no unmeasured confounding between D and Y is replaced with other unprovable assumptions (e.g., no unmeasured confounding between Z and Y) [
      • Hernán M.A.
      • Robins J.M.
      Causal Inference: What If.
      ,
      • Huntington-Klein N.
      The Effect: An Introduction to Research Design and Causality.
      ]. In sum, IV designs is an attractive solution to the key issue of unmeasured confounding which haunts causal inference from observational data. Applications of IV methods, nonetheless, must consider the relevance of estimates and address strong assumptions.

      Acknowledgments

      Tarjei Widding-Havneraas is supported by funding from the Western Norway Regional Health Authority ( 912197 ) and Research Council Norway ( 288585/IAR ). Henrik Daae Zachrisson's contribution was supported by funding from the European Research Council Consolidator Grant ERC-CoG-2018 EQOP [grant number 818425 ].

      References

        • Westreich D.
        Epidemiology by Design: A Causal Approach to the Health Sciences.
        Oxford University Press, New York, NY2019
        • Hernán M.A.
        • Robins J.M.
        Causal Inference: What If.
        Chapman & Hall/CRC, Boca Raton2020
        • Elwert F.
        Graphical Causal Models.
        in: Morgan S.L. Handbook of Causal Analysis for Social Research. Springer, Dordrecht2013: 245-273
        • Huntington-Klein N.
        The Effect: An Introduction to Research Design and Causality.
        Chapman and Hall/CRC, 2022
        • Angrist J.D.
        Empirical Strategies in Economics: Illuminating the Path from Cause to Effect.
        National Bureau of Economic Research, Cambridge2022
        • Pingault J.B.
        • O’Reilly P.F.
        • Schoeler T.
        • Ploubidis G.B.
        • Rijsdijk F.
        • Dudbridge F.
        Using genetic data to strengthen causal inference in observational research.
        Nat Rev Genet. 2018; 19: 566-580
        • Cho Y.
        • Shin S.
        • Won S.
        • Relton C.L.
        • Davey Smith G.
        • Shin M.
        Alcohol intake and cardiovascular risk factors: a Mendelian randomisation study.
        Sci Rep. 2015; 5: 18422
        • Widding-Havneraas T.
        • Chaulagain A.
        • Lyhmann I.
        • Zachrisson H.D.
        • Elwert F.
        • Markussen S.
        • et al.
        Preference-based instrumental variables in health research rely on important and underreported assumptions: a systematic review.
        J Clin Epidemiol. 2021; 139: 269-278
        • Glymour M.M.
        • Swanson S.A.
        Instrumental Variables and Quasi-Experimental Approaches.
        in: Lash T.L. VanderWeele T.J. Haneuse S. Rothman K.J. Modern Epidemiology. Wolters Kluwer, New York, NY2021: 677-709
        • Brookhart M.A.
        • Rassen J.A.
        • Schneeweiss S.
        Instrumental variable methods in comparative safety and effectiveness research.
        Pharmacoepidemiol Drug Saf. 2010; 19: 537-554
        • Lundberg I.
        • Johnson R.
        • Stewart B.M.
        What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.
        Am Soc Rev. 2021; 86: 532-565
        • Davies N.M.
        • Holmes M.V.
        • Davey Smith G.
        Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians.
        BMJ. 2018; 362: k601
        • Labrecque J.
        • Swanson S.A.
        Understanding the Assumptions Underlying Instrumental Variable Analyses: a Brief Review of Falsification Strategies and Related Tools.
        Curr Epidemiol Rep. 2018; 5: 214-220
        • Skrivankova V.W.
        • Richmond R.C.
        • Woolf B.A.
        • Yarmolinsky J.
        • Davies N.M.
        • Swanson S.A.
        • et al.
        Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomization: The STROBE-MR Statement.
        JAMA. 2021; 326: 1614-1621
        • Swanson S.A.
        • Hernán M.A.
        Commentary: How to Report Instrumental Variable Analyses (Suggestions Welcome).
        Epidemiology. 2013; 24: 370-374