Advertisement

Effect Modifiers and Statistical Tests for Interaction in Randomized Trials

  • Robin Christensen
    Affiliations
    Section for Biostatistics and Evidence-Based Research, the Parker Institute, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark

    Research Unit of Rheumatology, Department of Clinical Research, University of Southern Denmark, Odense University Hospital, Denmark
    Search for articles by this author
  • Martijn J.L. Bours
    Affiliations
    Department of Epidemiology, GROW – School for Oncology and Developmental Biology, Maastricht University, Maastricht, The Netherlands
    Search for articles by this author
  • Sabrina M. Nielsen
    Correspondence
    Corresponding author.
    Affiliations
    Section for Biostatistics and Evidence-Based Research, the Parker Institute, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark

    Research Unit of Rheumatology, Department of Clinical Research, University of Southern Denmark, Odense University Hospital, Denmark
    Search for articles by this author

      Abstract

      Statistical analyses of randomized controlled trials (RCTs) yield a causally valid estimate of the overall treatment effect, which is the contrast between the outcomes in two randomized treatment groups commonly accompanied by a confidence interval. In addition, the trial investigators may want to examine whether the observed treatment effect varies across patient subgroups (also called ‘heterogeneity of treatment effects’), i.e. whether the treatment effect is modified by the value of a variable assessed at baseline. The statistical approach for this evaluation of potential effect modifiers is a test for statistical interaction to evaluate whether the treatment effect varies across levels of the effect modifier. In this article, we provide a concise and nontechnical explanation of the use of simple statistical tests for interaction to identify effect modifiers in RCTs. We explain how to calculate the test of interaction by hand, applied to a dataset with simulated data on 1,000 imaginary participants for illustration.

      1. Background

      Randomized controlled trials (RCTs) are considered the gold standard when evaluating a treatment's effectiveness because of their high internal validity when appropriately conducted. The goal of randomization is to balance both observed and unobserved participant characteristics between two (or more) randomly allocated treatment groups. Thus, the RCT design allows causal effects of treatments to be estimated because confounding will generally not be an issue [
      • Little R.J.
      • Rubin D.B.
      Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches.
      ]. Usually, statistical analyses of RCTs yield an estimate of the overall treatment effect (say, Eoverall), which is the contrast between the outcomes in two treatment groups commonly accompanied by a confidence interval.
      RCTs can also have good external validity if they are based on real-life populations that are relevant for the intervention, treats the control group with an acceptable standard of care, and reports outcomes that are meaningful. An ideal trial in this regard enrolls patients with a broad range of background characteristics, for example, disease severity, age, sex, race, and prior therapies. Following the primary analyses estimating the overall treatment effect, Eoverall, the trial investigators may want to examine whether the observed treatment effect varies across patient subgroups (also called ‘heterogeneity of treatment effects’). In such cases we are interested in examining whether the treatment effect is modified by the value of another variable (i.e. the effect modifier) [
      • Kent D.M.
      • et al.
      The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement.
      ]. The statistical approach for evaluating potential effect modifiers is a test for statistical interaction [
      • Altman D.G.
      • Bland J.M.
      Interaction revisited: the difference between two estimates.
      ].
      Findings from investigating heterogeneity of treatment effects for an RCT are important for understanding, interpreting, and translating the findings, and consequently for determining whether there is an appropriate patient sub-population for treatment use. Evidence for effect modification therefore helps to delineate applicability of an intervention, showing in whom the treatment is most likely to work and thus indicative of an RCT's external validity. In this article, we provide a concise and nontechnical explanation of simple statistical tests for interaction to identify effect modifiers in RCTs.

      2. Definition

      The statistical tests for interaction are often referred to as subgroup analyses, implying any comparison of effect between treatment groups (net benefit) across subsets (i.e. subgroups) of patients with specific characteristics that could be potentially relevant effect modifiers. Usually subgroup analyses investigate subgroups defined by a factor measured either before or at baseline, such as sex (males vs. females). Subgroup analyses can be misleading if they are based on data-driven hypotheses, employ inappropriate statistical methods, or fail to account for multiple testing [
      • Schandelmaier S.
      • Briel M.
      • Varadhan R.
      • Schmid C.H.
      • Devasenapathy N.
      • Hayward R.A.
      • et al.
      Development of the Instrument to assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and meta-analyses.
      ]. As exemplified by Alosh et al [
      • Alosh M.
      • Huque M.F.
      • Bretz F.
      • D'Agostino Sr, R.B.
      Tutorial on statistical considerations on subgroup analysis in confirmatory clinical trials.
      ], one should distinguish between three categories of subgroup analysis: (i) exploratory analyses search for differential responses from early clinical trial data or from clinical trials that failed to establish treatment efficacy in its intended population; (ii) supportive analyses aim at investigating the consistency of treatment effect across subgroups for a clinical trial that has established treatment efficacy in its intended overall population; and finally (iii) inferential analyses aim at establishing treatment efficacy in a pre-defined targeted subgroup and/or in the overall population.
      The subgroups of interest are defined, preferably a priori, and the baseline variable under consideration needs to precede treatment in time. In the simplest case, our baseline factor is a covariate with only two levels (e.g. male vs. female subjects), leading to two subgroups (e.g. subgroup 1: males, subgroup 2: females). If we want to compare the treatment effects observed in the two subgroups, a first step is to estimate the treatment effects (i.e. net benefit) within each subgroup in separate analyses (E1 and E2, respectively). Next, a test for statistical interaction comparing the two subgroups can be calculated by hand based on the subgroup treatment effects (E1 and E2) and their corresponding standard errors (SEE1 and SEE2) [
      • Altman D.G.
      • Bland J.M.
      Interaction revisited: the difference between two estimates.
      ]:
      • Difference between subgroup effects, d=E1E2
      • Standard error for d, SEd=SEE12+SEE22
      • Test statistics for the z-test, zvalue=dSEd
      The p-value can be found by using the absolute (non-negative) z value which gives a test of the null hypothesis that in the population the difference between subgroups (d) is zero, by comparing the value of z to the standard normal distribution. For effect measures on a multiplicative scale (such as risk ratio, hazard ratio, or odds ratio) as opposed to the additive scale (such as risk differences), the analyses should be performed using the log-transformation and with the corresponding standard errors [
      • Altman D.G.
      • Bland J.M.
      Interaction revisited: the difference between two estimates.
      ]. Importantly, effect modification may be present on one scale but not on another, and conflicting opinions exist on which scale to use [
      • Doi S.A.
      • Furuya-Kanamori L.
      • Xu C.
      • Lin L.
      • Chivese T.
      • Thalib L.
      Questionable utility of the relative risk in clinical research: a call for change to practice.
      ]. The European Medicines Agency (EMA) recommends using the scale on which the endpoint is commonly analyzed, and to present supplementary analyses on the complementary scale where inconsistency is observed [
      European Medicines Agency (EMA)
      Guideline on the investigation of subgroups in confirmatory clinical trials.
      ].

      3. Application

      For presenting the results of subgroup analyses graphically, forest plots are useful. Preferably, the plots should include a bold vertical line at the overall treatment effect (i.e., Eoverall) rather than at the null (i.e., ‘no effect’) to guide correct interpretation regarding heterogeneity of treatment effects across subgroups. Fig. 1 illustrates an example based on a simulated dataset on 1,000 imaginary participants (randomized 1:1); the data was generated to reveal a standardized mean difference corresponding to a statistically significant moderate overall treatment effect of Eoverall = 5.00 (95%CI: 3.73 to 6.27) units. To this dataset we deliberately generated a contextual factor (CF 1) that would create two separate subgroups with different magnitudes of treatment effects (E1: 8.00 and E2: 2.00 units, respectively). The standard errors can be calculated from the confidence intervals shown in the figure, SEE1: (9.75-8.00)/1.96 = 0.89 and SEE2: (3.78-2.00)/1.96 = 0.91, respectively. From these values we can test the interaction and estimate the difference between the subgroups (with confidence interval). The test of interaction:
      d=8.002.00=6.00


      SEd=0.892+0.912=1.273


      zvalue=6.001.273=4.71


      Fig 1
      Fig. 1Forest plot showing the results of subgroup analyses based on a simulated dataset on 1,000 imaginary participants. The outcome is based on continuous data. The bold vertical line indicates the overall treatment effect, and the dashed line indicates no effect.
      A z-value of 4.71 gives p<0.001 when we refer it to a table of the normal distribution. The estimated interaction effect is d = 6.00 units; the corresponding 95% confidence interval is 6.00 ± 1.96*1.273 (i.e., 95%CI 3.50 to 8.50). The data thus provide evidence for effect modification, indicating that the treatment effect is significantly stronger in CF 1-positive than CF 1-negative trial participants. The other presented contextual factors shown in Fig. 1 were computer-generated completely at random, and thus any apparent effect modification across CF 2, CF 3, …., and CF 7 reflect purely chance findings (a well-known caveat to multiple testing without an a priori hypothesis).

      4. Pointers

      Altman and Bland originally presented this simple approach as an “interaction revisited” statistics notes, in the BMJ back in 2003 [
      • Altman D.G.
      • Bland J.M.
      Interaction revisited: the difference between two estimates.
      ]. This approach is transparent and feasible when we want to compare two estimated quantities, such as means (Fig. 1) or proportions (Fig. 2), each with its standard error.
      Fig 2
      Fig. 2Forest plot showing the results of subgroup analyses based on a simulated dataset on 1,000 imaginary participants. The outcome is based on dichotomous data. The bold vertical line indicates the overall treatment effect, and the dashed line indicates no effect.
      Although highly feasible, investigating subgroup effects should be done with great care and interpreted cautiously. Most trials are not powered to detect subgroup differences but reporting the results anyway will allow future meta-analyses to investigate this based on several trials thereby achieving the sufficient power. Currently, there exist no explicit/standard list of factors to be investigated for effect modification in trials. However, one may initially be inspired by the U.S. Food and Drug Administration (FDA) requiring effectiveness data to be analyzed by sex, age, and racial subgroups.

      Appendix. Supplementary materials

      References

        • Little R.J.
        • Rubin D.B.
        Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches.
        Annu Rev Public Health. 2000; 21: 121-145
        • Kent D.M.
        • et al.
        The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement.
        Ann Intern Med. 2020; 172: 35-45
        • Altman D.G.
        • Bland J.M.
        Interaction revisited: the difference between two estimates.
        Bmj. 2003; 326: 219
        • Schandelmaier S.
        • Briel M.
        • Varadhan R.
        • Schmid C.H.
        • Devasenapathy N.
        • Hayward R.A.
        • et al.
        Development of the Instrument to assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and meta-analyses.
        Cmaj. 2020; 192: E901-e906
        • Alosh M.
        • Huque M.F.
        • Bretz F.
        • D'Agostino Sr, R.B.
        Tutorial on statistical considerations on subgroup analysis in confirmatory clinical trials.
        Stat Med. 2017; 36: 1334-1360
        • Doi S.A.
        • Furuya-Kanamori L.
        • Xu C.
        • Lin L.
        • Chivese T.
        • Thalib L.
        Questionable utility of the relative risk in clinical research: a call for change to practice.
        J Clin Epidemiol. 2020;
        • European Medicines Agency (EMA)
        Guideline on the investigation of subgroups in confirmatory clinical trials.
        European Medicines Agency (EMA), London, United Kingdom2019