Abstract
Keywords
1. Introduction
- •Summary of Findings tables provide succinct presentations of evidence quality and magnitude of effects.
- •Summarizing the findings of continuous outcomes presents special challenges to interpretation that become daunting when individual trials use different measures for the same construct.
- •The most commonly used approach to providing pooled estimates for different measures, presenting results in standard deviation units, has limitations related to both statistical properties and interpretability.
- •Potentially preferable alternatives include presenting results in the natural units of the most popular measure, transforming into a binary outcome and presenting relative and absolute effects, presenting the ratio of the means of intervention and control groups, and presenting results in preestablished minimally important difference units.
Key points
- 1.investigators have all used the same measure that is familiar to the target audiences
- 2.investigators have all used the same or very similar measures that are less familiar to the target audiences
- 3.investigators have used different measures
2. Options when investigators have all used the same measure that is familiar to the target audiences
Patients, interventions, comparators | Participants (studies), follow-up | Quality of the evidence (GRADE) | Comparator | Intervention vs. comparator mean difference (95% CI) |
---|---|---|---|---|
Schizophrenia Supportive employment vs. other vocational approaches | 843 participants (five studies) 12–24 mo (mean= 19 mo) of follow-up | ⊕⊕⊕○ Moderate because of risk of bias (suspicion of selective reporting bias) | 32.3 d of competitive employment | 45.9 d (95% CI: 34.7, 57.1) longer in competitive employment |
Children with acute diarrhea Zinc vs. placebo | 4,242 participants (13 studies) | ⊕⊕⊕○ Moderate because of inconsistency | The mean diarrhea duration (hr) ranged across control groups from 59 to 170 hr | 9.60 (95% CI: 18.25, 0.96) fewer hours of diarrhea |
Common cold NSAIDs vs. no NSAIDs | 214 participants (two studies) | ⊕⊕○○ Low because of imprecision and heterogeneity | 7.33 d | 0.23 (95% CI: 1.75 fewer to 1.29 more) fewer days of cold symptoms |
Surgery Supplemental perioperative oxygen vs. routine oxygen administration | 2,963 participants (four studies) | ⊕⊕⊕○ Moderate because of imprecision | The mean hospital stay (d) across control groups ranged from 6.4 to 11.9 d | 0.86 d (95% CI: −0.29, 2.00) longer hospital stay |
3. Options when investigators have all used the same or very similar measures that are less familiar to the target audiences
Outcomes | Absolute risks (95% CI) | Relative effect, OR (95% CI) | Number of participants (studies) | Quality of the evidence (GRADE) | Comments | |
---|---|---|---|---|---|---|
Estimated risk | Corresponding risk | |||||
Without stockings | With stockings (95% CI) | |||||
Symptomatic deep vein thrombosis: inferred from surrogate, symptomless deep vein thrombosis | Low-risk population | 0.10 (0.04, 0.25) | 2,637 (nine studies) | ⊕⊕⊕○ Moderate because of indirectness [4] | ||
5 per 10,000 | 0.5 per 10,000 (0–1.25) | |||||
High-risk population | ||||||
18 per 10,000 | 1.8 per 10,000 (1–8) | |||||
Edema: post-flight values measured on a scale from 0 (no edema) to 10 (maximum edema) | The mean edema score ranged across control groups from 6.4 to 8.9 | The mean edema score in the intervention groups was on average −4.72 lower (95% CI: −4.91, −4.52) | 1,246 (six studies) | ⊕⊕○○ Low [4] because of risk of bias (unblinded, unvalidated measure) | ||
Adverse effects | See comment | See comment | Not estimable | 1,182 (four studies) | See comment | The tolerability of the stockings was described as very good with no complaints of side effects in four studies [5] |
4. Options when investigators have used different measures
Approach | Advantages | Disadvantages | Recommendation |
---|---|---|---|
SD units (standardized mean difference; effect size) | Widely used | Interpretation challenging Can be misleading depending on whether population is very homogenous or heterogeneous | Do not use as the only approach |
Present as natural units | May be viewed as closer to primary data | Few instruments sufficiently used in clinical practice to make units easily interpretable | Approaches to conversion to natural units include those based on SD units and rescaling approaches. We suggest the latter. In rare situations when instrument very familiar to frontline clinicians, seriously consider this presentation |
Relative and absolute effects | Very familiar to clinical audiences and thus facilitate understanding Can apply GRADE guidance for large and very large effects | Involve assumptions that may be questionable (particularly methods based on SD units) | If the MID is known, use this strategy in preference to relying on SD units Always seriously consider this option |
Ratio of means | May be easily interpretable to clinical audiences Involves fewer questionable assumptions than some other approaches Can apply GRADE guidance for large and very large effects | Cannot be applied when measure is change and therefore negative values possible Interpretation requires knowledge and interpretation of control group mean | Consider as complementing other approaches, particularly the presentation of relative and absolute effects |
MID units | May be easily interpretable to audiences Not vulnerable to population heterogeneity | Only applicable when MID is known To the extent that MID is uncertain, this approach will be less attractive | Consider as complementing other approaches, particularly the presentation of relative and absolute effects |
Outcomes | Estimated risk or estimated score/value with placebo | Absolute reduction in risk or reduction in score/value with dexamethasone | Relative effect (95% CI) | Number of participants (studies) | Confidence in effect estimate | Comments |
---|---|---|---|---|---|---|
(A) Postoperative pain, SD units: investigators measured pain using different instruments. Lower scores mean less pain | The pain score in the dexamethasone groups was on average 0.79 SDs (1.41–0.17) lower than in the placebo groups | — | 539 (5) | ⊕⊕○○ Low | As a rule of thumb, 0.2 SD represents a small difference, 0.5 a moderate, and 0.8 a large | |
(B) Postoperative pain, natural units: measured on a scale from 0 (no pain) to 100 (worst pain imaginable) | The mean postoperative pain scores with placebo ranged from 43 to 54 | The mean pain score in the intervention groups was on average 8.1 (1.8–14.5) lower | — | 539 (5) | ⊕⊕○○ Low | Scores estimated based on an SMD of 0.79 (95% CI: −1.41, −0.17) The minimally important difference on the 0–100 pain scale is approximately 10 |
(C) Substantial postoperative pain: investigators measured pain using different instruments | 20 per 100 | More patients in dexamethasone group achieved important improvement in pain score 0.15 (95% CI: 0.19, 0.04) | RR=0.25 (95% CI: 0.05, 0.75) | 539 (5) | ⊕⊕○○ Low | Scores estimated based on an SMD of 0.79 (95% CI: −1.41, −0.17) Method assumes that distributions in intervention and control groups are normally distributed and variances are similar |
(D) Postoperative pain: investigators measured pain using different instruments. Lower scores mean less pain | 28.1 | 3.7 lower pain score (6.1 lower 0.6 lower) | Ratio of means=0.87 (0.78–0.98) | 539 (5) | ⊕⊕○○ Low | Weighted average of the mean pain score in dexamethasone group divided by mean pain score in placebo |
(E) Postoperative pain: investigators measured pain using different instruments | The pain score in the dexamethasone groups was on average 0.40 (95% CI: 0.74, 0.07) minimally important difference units less than in the control group | — | 539 (5) | ⊕⊕○○ Low | An effect less than half the minimally important difference suggests a small or very small effect |
Outcomes | Estimated baseline score/proportion improving in control patients | Absolute increase in proportion improving in patients receiving respiratory rehabilitation | Relative effect (95% CI) | Number of participants (studies) | Confidence in effect estimate | Comments |
---|---|---|---|---|---|---|
(A) HRQL: investigators measured HRQL using different instruments. Higher scores mean better HRQL | The HRQL score in the respiratory rehabilitation group improved on average 0.72 (95% CI: 0.48, 0.96) SDs more in the respiratory rehabilitation patients than in the control patients | — | 818 (16) | ⊕⊕⊕⊕ High | As a rule of thumb, 0.2 SD represents a small difference, 0.5 a moderate, and 0.8 a large | |
(B) HRQL measured on a scale of 1–7 | Control group baseline, 4.5 Average improvement in control, 0.04 | HRQL improved on average 0.71 (95% CI: 0.48, 0.94) more in the respiratory rehabilitation patients than in the control patients | — | 818 (16) | ⊕⊕⊕⊕ High | Calculated by transforming all scores to the CRQ in which the minimally important difference is 0.5 |
(C) Proportion of patients with important improvement in HRQL | 0.30 b This represents the median of the proportion of patients in the control group who achieved an improvement greater than the minimally important difference. That is, in the study at the center of the distribution of change, 30% of the control group achieved an improvement of more than 0.5 (CRQ) or 4 (St. George's). | Differences in proportion achieving important improvement 0.31 (95% CI: 0.22, 0.40) in favor of rehabilitation | OR = 3.36 (95% CI: 2.31, 4.86) | 818 (16) | ⊕⊕⊕⊕ High | Calculation uses established minimally important difference of 0.5 units on the CRQ and 4 units on the St. George's Respiratory Questionnaire |
(D) The currently recommended approach to ratio of means relies on posttest only and is therefore not applicable to change scores, which are the focus of results from these trials | ||||||
(E) HRQL measured in minimally important difference units | HRQL improved on average 1.75 (95% CI: 1.37, 2.13) minimally important difference units more in the respiratory rehabilitation than in the control group | — | 818 (16) | ⊕⊕⊕⊕ High | An effect of close to two times the minimally important difference suggests a moderate to large effect |
4.1 Standard deviation units: standardized mean difference
4.2 Conversion into units of the most commonly used instrument
4.3 Conversion to relative and absolute effects
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
---|---|---|---|---|---|---|---|---|---|
For situations in which the event is undesirable, reduction (or increase if intervention harmful) in adverse events with the intervention | |||||||||
Control group response rate (SMD) | |||||||||
−0.2 | −0.03 | −0.05 | −0.07 | −0.08 | −0.08 | −0.08 | −0.07 | −0.06 | −0.040 |
−0.5 | −0.06 | −0.11 | −0.15 | −0.17 | −0.19 | −0.20 | −0.20 | −0.17 | −0.12 |
−0.8 | −0.08 | −0.15 | −0.21 | −0.25 | −0.29 | −0.31 | −0.31 | −0.28 | −0.22 |
−1.0 | −0.09 | −0.17 | −0.24 | −0.23 | −0.34 | −0.37 | −0.38 | −0.36 | −0.29 |
For situations in which the event is desirable, increase (or decrease if intervention harmful) in positive responses to the intervention | |||||||||
Control group response rate (SMD) | |||||||||
0.2 | 0.04 | 0.61 | 0.07 | 0.08 | 0.08 | 0.08 | 0.07 | 0.05 | 0.03 |
0.5 | 0.12 | 0.17 | 0.19 | 0.20 | 0.19 | 0.17 | 0.15 | 0.11 | 0.06 |
0.8 | 0.22 | 0.28 | 0.31 | 0.31 | 0.29 | 0.25 | 0.21 | 0.15 | 0.08 |
1.0 | 0.29 | 0.36 | 0.38 | 0.38 | 0.34 | 0.30 | 0.24 | 0.17 | 0.09 |
4.4 Ratio of means
4.5 MID units
5. Reflections on the interpretation of the five methods
6. Recommendations for enhancing interpretability in meta-analyses in which primary studies use different instruments to measure the same underlying construct
- 1.Using more than one presentation is likely to be both informative and, if the clinical message is similar, reassuring. It can also reduce the risk of biased selection of which presentation to use when the messages are different. If the messages are different, and it is not clear which to believe, review authors could consider rating down their confidence for inconsistency. Tables 4 and 5 present a model for presenting more than one approach within a single SoF table.
- 2.When one instrument is in use in regular clinical practice and is familiar to most consumers of a systematic review or guideline, a presentation in natural units of that instrument should be one of the options chosen.
- 3.Comments should be geared to helping with interpretation (e.g., rules of thumb for interpreting SMD and stating the MID if established)
- 4.If possible, choose methods that do not rely on SD units. If SD units are chosen, provide some guide for interpretation. In approach (B), the rescaling option would be preferable to multiplying the effect in SD units by the SD of the most popular instrument. In approach (C), generating relative and absolute effects using the MID is, if it is available, preferable to using any of the approaches that rely on units.
- 5.In most instances, one should seriously consider expressing the magnitude of effect as both an OR or relative risk as well as a risk difference. The advantages include familiarity for clinicians and ability to apply GRADE guidance for large and very large effects (for relative effect) and usefulness for clinical decision making (for absolute effects) (Table 3). Because presentation of relative effects alone may be misleading, in particular when relative effects are large but absolute effects small, the summary should ensure communication of the magnitude of absolute effect.
- 6.Reviewers should aim at transparency, citing the source of MIDs and SDs used, and the underlying assumptions.
7. Conclusion
References
- GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables.J Clin Epidemiol. 2011; 64: 383-394
- GRADE guidelines: 2. Framing the question and deciding on important outcomes.J Clin Epidemiol. 2011; 64: 395-400
- GRADE guidelines: 3. Rating the quality of evidence.J Clin Epidemiol. 2011; 64: 401-406
- GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias).J Clin Epidemiol. 2011; 64: 407-415
- GRADE guidelines: 5. Rating the quality of evidence—publication bias.J Clin Epidemiol. 2011; 64: 1277-1282
- GRADE guidelines 6. Rating the quality of evidence—imprecision.J Clin Epidemiol. 2011; 64: 1283-1293
- GRADE guidelines: 7. Rating the quality of evidence–inconsistency.J Clin Epidemiol. 2011; 64: 1294-1302
- GRADE guidelines: 8. Rating the quality of evidence—indirectness.J Clin Epidemiol. 2011; 64: 1303-1310
- GRADE guidelines: 9. Rating up the quality of evidence.J Clin Epidemiol. 2011; 64: 1311-1316
- GRADE guidelines 10 - Considering resource use and rating the quality of economic evidence.J Clin Epidemiol. 2013; 66 ([in this issue]): 140-150
- GRADE guidelines 11 - Making an overall rating of evidence for a single outcome and for all outcomes.J Clin Epidemiol. 2013; 66 ([in this issue]): 151-157
- GRADE guidelines 12 - Preparing summary of findings tables (SOF) - binary outcomes.J Clin Epidemiol. 2013; 66 ([in this issue]): 158-172
- Methods to explain the clinical significance of health status measures.Mayo Clin Proc. 2002; 77: 371-383
- How can quality of life researchers make their work more useful to health workers and their patients?.Qual Life Res. 2007; 16: 1097-1105
- Measurement of health status. Ascertaining the minimal clinically important difference.Control Clin Trials. 1989; 10: 407-415
- Compression stockings for preventing deep vein thrombosis in airline passengers.Cochrane Database Syst Rev. 2009; 3
- Thrombolysis for acute ischaemic stroke.Cochrane Database Syst Rev. 2003; 3 (Art. No.): CD00021310.1002/14651858.CD000213
- Meta-analysis of flavonoids for the treatment of haemorrhoids.Br J Surg. 2006; 93: 909-920
Furukawa TA, Akechi T, Wagenpfeil S, Leucht S. Relative indices of treatment effect may be constant across different definitions of response in schizophrenia trials. Schizophr Res;126(1–3):212–219.
- Binary methods for continuous outcomes: a parametric alternative.J Clin Epidemiol. 1991; 44: 241-248
- The cost of dichotomising continuous variables.BMJ. 2006; 332: 1080
- BDI-II: Beck Depression Inventory Manual.2nd ed. The Psychological Corporation, San Antonio, TX1996
- Development of a rating scale for primary depressive illness.Br J Soc Clin Psychol. 1967; 6: 278-296
- Pooling health-related quality of life outcomes in meta-analysis—a tutorial and review of methods for enhancing interpretability.Res Synth Methods. 2011; 2: 188-203
- The impact of prophylactic dexamethasone on nausea and vomiting after laparoscopic cholecystectomy: a systematic review and meta-analysis.Ann Surg. 2008; 248: 751-762
- Pulmonary rehabilitation for chronic obstructive pulmonary disease.Cochrane Database Syst Rev. 2006; 18
- Analyzing data and undertaking meta-analyses.in: Higgins J. Green S. Cochrane handbook for systematic reviews of interventions version 5.1.0. Wiley, Chichester, UK2011
- Effect-size estimates: issues and problems in their interpretation.J Consum Res. 1996; 23: 89-105
- Statistical power analysis in the behavioral sciences.Erlbaum, Hillsdale, NJ1988
- Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations.J Pain. 2008; 9: 105-121
- Interpreting treatment effects in randomised trials.BMJ. 1998; 316: 690-693
- From effect size into number needed to treat.Lancet. 1999; 353: 1680
- Cox D. Snell E. Analysis of binary data. Chapman and Hall, London, UK1989
- Meta-analysis of screening and diagnostic tests.Psychol Bull. 1995; 117: 167-178
- The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study.BMC Med Res Methodol. 2008; 8: 32
- Improving the interpretation of health-related quality of life evidence in meta-analysis: the application of minimal important difference units.Health Qual Life Outcomes. 2010; 11: 116
Article info
Publication history
Footnotes
The GRADE system has been developed by the GRADE Working Group. The named authors drafted and revised this article. A complete list of contributors to this series can be found on the Journal of Clinical Epidemiology Web site.
Identification
Copyright
ScienceDirect
Access this article on ScienceDirectLinked Article
- Erratum to “GRADE guidelines: 13. Preparing Summary of Findings tables and evidence profiles—continuous outcomes” [J Clin Epidemiol 2013;66(2):173-183]Journal of Clinical EpidemiologyVol. 68Issue 4