Effective dose 50 method as the minimal clinically important difference: Evidence from depression trials

Objective Previous research on the minimal clinically important difference (MCID) for depression and anxiety is based on population averages. The present study aimed to identify the MCID across the spectrum of baseline severity. Study Design and Settings The present analysis used secondary data from 2 randomized controlled trials for depression (n = 1,122) to calibrate the Global Rating of Change with the PHQ–9 and GAD–7. The MCID was defined as a change in scores corresponding to a 50% probability of patients "feeling better", given their baseline severity, referred to as Effective Dose 50 (ED50). Results MCID estimates depended on baseline severity and ranged from no change for very mild up to 14 points (52%) on the PHQ–9 and up to 10 points (48%) on the GAD–7 for very high severity. The average MCID estimates were 3.7 points (23%) and 3.3 (28%) for the PHQ–9 and GAD–7 respectively. Conclusion The ED50 method generates MCID estimates across the spectrum of baseline severity, offering greater precision but at the cost of greater complexity relative to population average estimates. This has important implications for evaluations of treatments and clinical practice where users can use these results to tailor the MCID to specific populations according to baseline severities.

Baseline dependency is a well-established problem -the change needed to feel better varies according to baseline severity, with patients who have more symptoms commonly requiring greater changes to experience a subjective improvement. [1][2][3] Various methods to address this problem. The most prominent methods include effect sizes, statistical control, proportionate change, and MCID categories. 1 Mean change methods broadly-speaking encompasses approaches that examine the mean change at different levels of the GRC. That is, two of the originally developed methods include examining the mean change amongst those who feel slightly better or calculating the difference in mean change of those responding feeling slightly better and feeling about the same. 4,5 This method does not take baseline dependency into account. To account for baseline dependency, effect sizes can be estimated. 6,7 While effect sizes are useful for comparative purposes, they have been criticised for providing little clinical information and being difficult to interpret. 8 Statistical control can be implemented in models to reduce the effect of baseline severity. 1 However, Copay et al. reports that because extreme scores are assumed to be a result of error/chance, the true variation is masked despite the fact that medical patients might be expected to have higher symptom scores. 1 Proportionate change is the percentage of how much someone's symptoms change relative to their baseline score. While this approach is beneficial as it allows for comparisons between measures, it may increase the association with baseline scores when patients with high symptoms have small change. 1 A further approach is to provide multiple MCID estimates for categories of patients, which are grouped based on the certain levels of the measure. 1 These can sometimes create somewhat arbitrary groups and reduce the benefit of one MCID figure. 1 Previous analyses have shown that using proportionate scores in depression and anxiety provide a good rule of thumb for those with moderately-severe symptoms, but do not fully account for baseline dependency at all ranges and may additionally require categorization of the MCID based on baseline severity categories (mild, moderate, severe). 2,3 Our previous and present analyses additionally show that adjustment for baseline severity, either as an interaction term in linear models of by using percent change, is insufficient to fully account for baseline severity across the scale. 2 Our argument is that if it is not possible to fully capture baseline dependency and produce one universal MCID with the methods above, it may be worthwhile having a more detailed and precise approach that, by definition, fully captures baseline severity to be used alongside the existing rule-of-thumb.

Modelling Approach
The MCID is a concept rather than being mathematically defined. As such, multiple methods have been proposed to estimate the MCID. Some of the most common methods include mean change, linear regression, and Receiver Operator Characteristics (ROC) curve. 1,7,9 A comprehensive review of MCID approaches can be found elsewhere. 1,7,9 As above, mean change methods don't account for baseline dependency and standardised differs have been criticised for providing little clinical information and being difficult to interpret. [4][5][6][7][8][9] Using linear regression allows the mean changes to be adjusted for baseline severity to account for baseline dependency. 1,7,9,10 However, previous research and the current analyses show that baseline adjustment is insufficient to fully account for baseline dependency in depression and anxiety across the spectrum. 2,3 ROC curves identify the point of optimal sensitivity and specificity between two groups of GRC responses (i.e., those who feel better vs. those who do not feel better) to denote the MCID. 1,2,9,10 However, they have the limitation that this estimate can be unstable and subject to changes in the ROC curve (and thus the relative balance of sensitivity/specificity) following the addition/deletion of a few data points.
We propose adopting the ED50 as a new means of estimating the MCID. The ED50 is frequently used in drug safety and pharmacotherapy research to identify the minimum effective therapeutic dose. 11 An example of an application is the prescription of drugs, where the ED50 is used as a guideline for clinicians to identify the smallest effective dose of medication. 11 For the purposes of this analysis, the MCID is defined as the changes in scores associated with a 50% probability of feeling better. The ED50 has clear face validity as a MCID metric as it marks the threshold where patients are slightly more likely to feel better than not. The ED50 is based on a model derived from all data and therefore not as susceptible to the limitations of the ROC analysis and mean change methods. Using this approach, we further address the problem of baseline dependency above and beyond covariate adjustment in GLM. By incorporating baseline severity in the GAMM model additively we provide an MCID for each level of baseline severity. Thus, addressing the limitation that covariate adjustment alone does not appear to fully account for baseline dependency in depression and anxiety. Using this method also allows for the identification of any important difference (i.e., ED25 and ED75) to examine the probabilities associated with different changes, which provides a granularity that other approaches do not. This provides flexibility to the end user to identify and select the amount of certainty in treatment response.

1.3a Inclusion of all GRC responses
A common approach to estimating the MCID is to examine the mean change amongst those who feel slightly better or to examine the difference in mean change of those responding feeling slightly better and feeling about the same to find the minimal clinically important difference. 4,5 CoBalT patients were asked how they felt in comparison to the last assessment to which they could respond: "I feel better", "I feel about the same", and "I feel worse". 12 In PANDA, patients were asked how they felt compared to when they were last seen at all time points, with fixed responses entailing: "I feel a lot better", "I feel slightly better", "I feel about the same", "I feel slightly worse", and "I feel a lot worse". 13 As such, only one of the RCTs contained a more fine-grained breakdown and we were limited by the data available. This approach to estimating mean change is useful, but it has the limitation of throwing away a lot of data where only one or two of several finegrained GRC categories are of interest, or when the categories are limited (as with CoBalT) the estimates can be inflated by inclusion of those people who felt very much better.
When estimating from statistical models (GLM or GAMM) it is preferrable to include all observations to reduce bias and increase the precision of the model. Examining only subgroups of patients can lead to erroneous results. 14 As such, we included all GRC responses in the present analysis. The present analysis nonetheless examines the minimal point as it is modelling the probability of feeling better by baseline severity and change in symptoms, rather than looking at the mean differences stratified by the GRC. From this model, we estimate the MCID as a threshold (a lower bound, if you will) of 50% chance or greater of feeling better. Unlike the categorical mean change approach this threshold is relatively robust to the inclusion of those with a wide range of GRC ratings.

1.3b Exclusion of time as a model effect
We found a statistically significant effect of time on the proportion of people feeling better at followup 1. However, we are interested in MCID estimates, and when we examined the effect of time on the ED50 (MCID estimates) we found marginal differences as a result of time, that were of little practical importance to the MCID estimates. We therefore excluded time from the final model for pragmatic purposes; future users will want to select a MCID without having to decide which followup period is closest to theirs. Unless there is strong evidence that time makes a practically significant difference to the MCID there is little benefit of adding time for the end-user.

1.3c Pooling of studies and exclusion of study as a model term
We pooled data from two RCTs. This has the benefit of higher precision as there are more observations per level of baseline severity. We found a statistically significant effect of study on the probability of feeling better driven by differing baseline severities at follow-up 1, with PANDA having fewer observations at the very high end of scores, and CoBalT fewer lower scores. 10,11 As such, MCID estimates for the low end of scores from CoBalT at follow-up 1 alone will be unreliable. The opposite is true for PANDA. The effect of study disappeared after follow-up 1, further suggesting that the initial difference is a result of the different baseline characteristics of the RCTs and will therefore have little practical importance to the MCID estimates. Pooled data is preferable as it provides a greater coverage of baseline scores at time point 1 and the model produces a weighted average where most of the weight comes from one study. Similar to the covariate of time, the effects of study on the MCID estimates were of little practical importance. We therefore exclude study from the final model. This is advantageous because future users will want to select an MCID for their study without having to decide whether it is more like one of the two studies -the result is more generalisable in this way.

1.3d Inclusion of treatment and control groups
We include both treatment and control groups in the present analyses for the purposes of generalisability. Our primary aim is to identify a change in scores that is noticeable to patients, but we are agnostic to how this change is produced. We assume a stability in the relationship between the changes in symptoms and the GRC that does not vary by treatment. We have no reason to believe that different treatments require a different MCID, i.e., if a patient changes a given amount on the PHQ-9 they should be as likely to notice this difference regardless of whether it was brought about by SSRIs, CBT or placebo/natural recovery. From our perspective (for the purposes of this analysis), the treatment is simply a means of inducing change.

1.3e No further covariate adjustment
While the adjustment of various other covariates is technically possible, we are unable to implement them in the current analyses. Firstly, there is a sample size consideration as we are stratifying by each level of baseline severity. This analysis would require much larger sample sizes for the adjustment of additional covariates, which would only be feasible through the analysis of electronic healthcare records or pooling of a very large number of clinical trials. Unfortunately, we are limited by the data available to us. However, there are also pragmatic issues with further covariate adjustment. In order to produce generic MCID estimates, the covariates would have to be fixed at certain points to estimate the ED50, introducing an array of assumptions that may not hold across all patients. Alternately, an MCID can be estimated for each patient individually. However, this would firstly require that data to be available, creating the burden of additional data collection on patients and clinicians/researchers. This may be difficult in clinical practice due to time constraints but also in clinical research where these measures may otherwise not be of interest. Secondly, the calculation would be too extensive to print in any format and would require an online resource.

Supplementary Material B -Correlation between the Global Rating of Change and Change in Questionnaire
Supplementary Material B. Spearman rank correlation coefficients between change in symptoms and the categorical Global Rating of Change, stratified by study and follow-up

Supplementary Material D -95% Confidence Intervals of Generalised Additive Mixed Models
The following graphs present slices through smooth surface plots at the mild, moderate, and severe thresholds for each outcome questionnaire. The predicted values are presented on the logit scale to assess variability at each level of change, with corresponding 95% confidence intervals. Limits were set the maximum obtainable change for each level of baseline severity.
The confidence intervals widen slightly towards the extreme ends of change, particularly when baseline severity is low and extreme reduction (deteriorations) take place as it was rare for patients with such mild symptoms to deteriorate drastically. This is unlikely to have an impact on the ED50 estimates, as they are focused on positive changes above 0, where the confidence intervals are visibly narrower.