Abstract
Summary of Findings (SoF) tables present, for each of the seven (or fewer) most important outcomes, the following: the number of studies and number of participants; the confidence in effect estimates (quality of evidence); and the best estimates of relative and absolute effects. Potentially challenging choices in preparing SoF table include using direct evidence (which may have very few events) or indirect evidence (from a surrogate) as the best evidence for a treatment effect. If a surrogate is chosen, it must be labeled as substituting for the corresponding patient-important outcome.
Another such choice is presenting evidence from low-quality randomized trials or high-quality observational studies. When in doubt, a reasonable approach is to present both sets of evidence; if the two bodies of evidence have similar quality but discrepant results, one would rate down further for inconsistency.
For binary outcomes, relative risks (RRs) are the preferred measure of relative effect and, in most instances, are applied to the baseline or control group risks to generate absolute risks. Ideally, the baseline risks come from observational studies including representative patients and identifying easily measured prognostic factors that define groups at differing risk. In the absence of such studies, relevant randomized trials provide estimates of baseline risk.
When confidence intervals (CIs) around the relative effect include no difference, one may simply state in the absolute risk column that results fail to show a difference, omit the point estimate and report only the CIs, or add a comment emphasizing the uncertainty associated with the point estimate.
1. Introduction
What is new?Key points
Summary of Findings (SoF) tables provide succinct, easily digestible presentations of confidence in effect estimates (quality of evidence) and magnitude of effects.
SoF table should present the seven (or fewer) most important outcomes—these outcomes must always be patient-important outcomes and never be surrogates, although surrogates can be used to estimate effects on patient-important outcomes.
SoF table should present the highest quality evidence. When quality of two bodies of evidence (e.g., randomized trials and observational studies) is similar, SoF table may include summaries from both.
SoF table should include both relative and absolute effect measures, and separate estimates of absolute effect for identifiable patient groups with substantially different baseline or control group risks.
The first 11 articles in this series introduced the GRADE approach to systematic reviews and guideline development [
[1]- Guyatt G.
- Oxman A.D.
- Akl E.A.
- Kunz R.
- Vist G.
- Brozek J.
- et al.
GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables.
], discussed the framing of the question [
[2]- Guyatt G.H.
- Oxman A.D.
- Kunz R.
- Atkins D.
- Brozek J.
- Vist G.
- et al.
GRADE guidelines: 2. Framing the question and deciding on important outcomes.
], and presented GRADE’s concept of confidence in effect estimates [
[3]- Balshem H.
- Helfand M.
- Schunemann H.J.
- Oxman A.D.
- Kunz R.
- Brozek J.
- et al.
GRADE guidelines: 3. Rating the quality of evidence.
] and how to apply it [
4- Guyatt G.H.
- Oxman A.D.
- Vist G.
- Kunz R.
- Brozek J.
- Alonso-Coello P.
- et al.
GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias).
,
5- Guyatt G.H.
- Oxman A.D.
- Montori V.
- Vist G.
- Kunz R.
- Brozek J.
- et al.
GRADE guidelines: 5. Rating the quality of evidence—publication bias.
,
6- Guyatt G.
- Oxman A.D.
- Kunz R.
- Brozek J.
- Alonso-Coello P.
- Rind D.
- et al.
GRADE guidelines: 6. Rating the quality of evidence—imprecision.
,
7- Guyatt G.H.
- Oxman A.D.
- Kunz R.
- Woodcock J.
- Brozek J.
- Helfand M.
- et al.
GRADE guidelines: 7. Rating the quality of evidence—inconsistency.
,
8- Guyatt G.H.
- Oxman A.D.
- Kunz R.
- Woodcock J.
- Brozek J.
- Helfand M.
- et al.
GRADE guidelines: 8. Rating the quality of evidence—indirectness.
,
9- Guyatt G.H.
- Oxman A.D.
- Sultan S.
- Glasziou P.
- Akl E.A.
- Alonso-Coello P.
- et al.
GRADE guidelines: 9. Rating up the quality of evidence.
]. In this 12th article, we describe the final product of a systematic review using the GRADE process, Summary of Findings (SoF) tables that present, for each relevant comparison of alternative management strategies, the quality rating for each outcome, the best estimate of the magnitude of effect in relative terms, and the absolute effect that one might see across subgroups of patients with varying baseline or control group risks. The focus of this article is on binary outcomes.
Box 1 presents the seven elements recommended for SoF tables.
Table 1,
Table 2,
Table 3, examples of SoF tables, highlight some of the issues in constructing such a table. Readers will find additional details in the Cochrane Handbook, Chapter 11 [
[10]- Schünemann H.
- Oxman A.
- Higgins J.
- Vist G.
- Glasziou P.
- Guyatt G.
Presenting results and ‘Summary of findings’ tables.
].
Box 1Seven elements of a Summary of Findings table- 1.
A list of all important outcomes, both desirable and undesirable;
- 2.
A measure of the typical burden of these outcomes (e.g. control group, estimated risk);
- 3.
A measure of the risk in the intervention group or, alternatively or in addition, a measure of the difference between the risks with and without intervention;
- 4.
The relative magnitude of effect;
- 5.
Numbers of participants and studies addressing these outcomes;
- 6.
A rating of the overall confidence in effect estimates for each outcome (which may vary by outcome); and possibly;
- 7.
Comments.
Table 1Summary of Findings table: Compression stockings compared with no compression stockings for people taking long flights
Abbreviations: DVT, deep vein thrombosis; CI, confidence interval; RR, risk ratio; GRADE, GRADE Working Group grades of evidence (see explanations).
Table 2Summary of Findings table—Should LMWH rather than VKAs be used for long-term treatment of VTE?aLimited to LMWH regimens that used 50% or more of the acute treatment dose during the extended phase of treatment.
, ∗The basis for the baseline risk (e.g., the median control group risk across studies) is provided in footnotes. The anticipated absolute effect is expressed as risk difference (and its 95% CI) and is based on the baseline risk in the comparison group and the relative effect of the intervention (and its 95% CI).
Abbreviations: CI, confidence interval; RR, risk ratio; PTS, Post-Thrombotic Syndrome; RCT, randomized controlled trial; LMWH, low molecular weight heparin; VKA, vitamin k antagonist; VTE, venous thromboembolism.
Table 3Summary of Findings table—RCTs of low-intensity pulsed ultrasound (LIPUS) for more rapid return to function (measured by direct measure and a surrogate—radiographic fracture healing)
Abbreviations: RCT, randomized controlled trial; CI, confidence interval.
2. The seven elements of a SoF table
SoF tables include seven elements (
Box 1). Uniformity of presentation is likely to facilitate readers’ familiarity and comfort with SoF tables and is therefore desirable and facilitated by the use of GRADEpro software [
]. Initial user testing with consumers of guidelines (clinicians and researchers) guided the format of
Table 1 [
12- Rosenbaum S.E.
- Glenton C.
- Oxman A.D.
Summary-of-findings tables in Cochrane reviews improved understanding and rapid retrieval of key information.
,
13- Rosenbaum S.E.
- Glenton C.
- Nylund H.K.
- Oxman A.D.
User testing and stakeholder feedback contributed to the development of understandable and useful Summary of Findings tables for Cochrane reviews.
]. In
Table 1, putting what is most important first guided the order of the columns, and the presentation of absolute risks was guided by a finding that some respondents found presentation of risk differences confusing.
In addition, experimental evidence from a randomized trial of alternative formats suggests that some may find differing formats of SoF tables, such as that presented in
Table 2,
Table 3, preferable (Vandvik et al., unpublished data). In
Table 2, the relative risk (RR) appears before the absolute risk on the basis that one uses the RR to calculate the absolute risk and, in both
Table 2,
Table 3, a column presents the absolute difference between groups. GRADEpro has been programmed to be responsive to these issues and has become increasingly flexible in accommodating alternative formats.
Uncertainty also exists regarding optimal terminology.
Table 1 uses the term “illustrative comparative risks” and the designation “assumed risk” because uncertainty in the estimate of baseline risk is ignored in making the calculations. Some GRADE members believe that “illustrative comparative risks” might confuse, and other tables substitute “absolute risk.” The other tables use alternative designations for the control group and intervention group risks. Further study may provide additional information about the optimal wording choices.
Table 4 presents the full evidence profile associated with
Table 1 addressing the desirable and undesirable consequences of wearing compression stockings on long plane rides. The table is atypical in that for some cells, which are shaded, it includes two sets of judgments, based on the same evidence—one of which is in regular type, the other in italics. The first is the judgment of Cochrane review authors [
[14]- Clarke M.
- Hopewell S.
- Juszczak E.
- Eisinga A.
- Kjeldstrom M.
Compression stockings for preventing deep vein thrombosis in airline passengers.
]; the second (italicized) is the judgment of thrombosis experts in a guideline sponsored by the American College of Chest Physicians [
[15]- Geerts W.H.
- Bergqvist D.
- Pineo G.F.
- Heit J.A.
- Samama C.M.
- Lassen M.R.
- et al.
Prevention of venous thromboembolism: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th edition).
].
Table 4Evidence profile: Compression stockings vs. no compression stockings for people taking long flightsaThe footnotes from Table 1 apply here as well, but we have not repeated them.
Abbreviations: DVTT, deep vein thrombosis; RCT, randomized controlled trial; RR, risk ratio.
This example demonstrates that the great merit of GRADE is not that it eliminates judgments—and thus disagreements—but rather that it makes the judgments transparent. For the many close-call judgments that are required in evaluating evidence, disagreement between reasonable individuals will be common. GRADE allows readers to readily discern the nature of the disagreement. Decision makers are then in a position to make their own judgments about the relevant issues. The SoF table (
Table 1) uses the judgments of the Cochrane reviewers.
3. Choosing which outcomes to present
SoF tables should ideally present results of all patient-important outcomes—possibly noting which ones are critical—without, however, overwhelming the reader. GRADE suggests inclusion of no more than seven outcomes, including both benefits and harms. If there are more than seven outcomes that are judged important, reviewers should choose the seven most important. This number is based on our intuition about the amount of information users can grasp, and an informal survey of attendees at a Cochrane Colloquium, and is therefore largely arbitrary. Limiting to seven may require combining related but different outcomes of approximately equal importance (e.g., calculating and presenting the number of patients who experienced either vomiting or diarrhea, considering these two as relatively equal minor gastrointestinal effects of temporary duration).
4. Presentation of direct vs. indirect evidence
Sometimes, direct measures of the patient-important outcomes are unavailable or, as in
Table 1, no events have occurred (for symptomatic venous thrombosis and pulmonary embolism). In such instances, reviewers should present their inferences regarding treatment effects on patient-important outcomes on the basis of the results of surrogate measures. That the inferences are coming from surrogates should be clearly labeled, and will almost certainly result in rating down the confidence in effect estimates for indirectness.
What are the mechanics of making inferences regarding patient-important outcomes from surrogates? The simplest approach is to find a best estimate of the baseline risk for the patient-important outcome, and apply the relative effect from the surrogate (see
Box 2 for an example of the arithmetic of applying an RR estimate to a baseline risk). For instance, in
Table 1, to estimate the absolute reduction in risk with stockings, we used an estimate of baseline risk from a meta-analysis and applied the RR from the surrogate, asymptomatic thrombosis.
Box 2Calculations in Summary of Findings tables and evidence profilesThe RR of symptomatic deep vein thrombosis from nine RCTs is 0.10 (95% CI=0.04–0.26).
The risk in the control group (estimated or assumed risk) from observational studies is 5 per 10 000.
Risk with intervention (
corresponding risk)
=
Risk with control
×
RR
Risk difference
=
Risk with control
−
risk with intervention
One uses exactly the same process to calculate the CIs around the risk difference, substituting the extremes of the CI (in this case 0.04 and 0.26) for the point estimate (in this case, 0.10). For instance, for the upper boundary of the CI:
Risk with intervention=5×0.26=1.3 per 10,000
Risk with control−risk with intervention=5−1.3=3.7 per 10,000
Risk difference=(0.74×5)/10,000=3.7/10,000.
Whenever the direct measure of a patient-important outcome is suboptimal (such as low-quality evidence) and a surrogate measure exists, reviewers have the option of focusing on whichever measure (the direct or surrogate measure) they feel yields higher-quality evidence or, as in
Table 1, presenting both. As in
Table 1, however, if they choose to focus partly or completely on surrogate results, reviewers must label the surrogate (in this case, asymptomatic venous thrombosis) for what it is, and include in its presentation the patient-important outcome for which it is a substitute (symptomatic thrombosis).
Another reason to present both direct and indirect measures is that the target audience for the review or guideline will want to see both.
Table 3 presents an example of such a situation. Here, the review authors address the effect of low-intensity, pulsed, ultrasound on fracture healing [
[16]- Busse J.W.
- Kaur J.
- Mollon B.
- Bhandari M.
- Tornetta 3rd., P.
- Schunemann H.J.
- et al.
Low intensity pulsed ultrasonography for fractures: systematic review of randomised controlled trials.
]. Although one could argue that the single trial that directly addresses function provides the higher-quality evidence, the clinical community of relevance is likely to be (misguidedly perhaps) more interested in radiographic fracture healing (the surrogate outcome for function). Thus, for nonoperatively managed fractures and for operatively managed fractures, the investigators chose to present both direct evidence of functional improvement from one trial and indirect evidence from radiographic healing, despite the fact that the direct evidence was of higher quality because it did not suffer from indirectness (
Table 3).
5. Presentation of randomized controlled trials or observational studies
Randomized controlled trials (RCTs) usually provide higher-quality evidence than observational studies and, if RCTs are available, SoF tables should generally restrict themselves to reporting RCT results. On occasion, however, limitations of RCTs or particular strengths of observational studies may lead to conclusions that their confidence in effect estimates is similar, or that observational studies provide higher-quality evidence.
For instance, consider the use of octreotide to prevent recurrent hypoglycaemia in patients with sulfonylurea overdose. Neither observational studies nor RCTs have addressed issues of mortality or long-term sequelae; thus, decisions must be based on the frequency of repeated hypoglycaemic episodes in the face of intravenous glucose administration.
The only RCT that addressed this issue administered a single dose of octreotide (the drug is ordinarily given as a continuous drip) [
[17]- Fasano C.J.
- O’Malley G.
- Dominici P.
- Aguilera E.
- Latta D.R.
Comparison of octreotide and standard therapy versus standard therapy alone for the treatment of sulfonylurea-induced hypoglycemia.
]. Of those randomized to octreotide, 10 (45%) of 22 suffered recurrent hypoglycaemic episodes as did 6 (33%) of 18 control patients (RR 1.36, 95% confidence interval [CI]
=
0.61–3.0). Three control, but no actively treated patients, suffered more than one recurrent hypoglycaemic episode. One would rate down confidence in estimates from this study for imprecision, and for indirectness of the intervention, suggesting an overall rating of low confidence in estimates.
At least 27 case reports have documented a marked decrease in hypoglycaemic episodes following octreotide administration [
18Octreotide for sulfonylurea-induced hypoglycemia following overdose.
,
19Octreotide treatment for sulfonylurea-induced hypoglycaemia.
]. Without untreated controls, these reports would be classified as very low-quality evidence but for the apparently large and rapid effects (repeated hypoglycaemic episodes that markedly decreased or ceased after the administration of octreotide). Considering the magnitude and rapidity of effect, one might classify these case reports, in aggregate, as providing low-quality evidence.
Given similar quality evidence, it would be inappropriate to rely exclusively on either the RCT or the case reports in constructing a SoF table regarding the administration of octreotide for hypoglycaemia associated with sulfonylurea overdose. The results of case reports and the RCT appear inconsistent; the overall confidence in effect estimates could therefore be classified as low or very low.
There may be instances in which the confidence in estimates from the observational studies is clearly superior to that of RCTs; under these circumstances, one would restrict the SoF table to observational studies. When randomized trials clearly provide greater confidence in estimates, one would restrict the SoF table to randomized trials. In general, in situations in which both sets of studies provide important evidence with more or less equal confidence in estimates, we encourage review and guideline authors to summarize both types of studies in separate rows in their SoF tables as in
Table 5.
Table 5Summary of Findings table: Use of octreotide in patients with sulfonylurea overdose
Abbreviations: RCT, randomized controlled trial; CI, confidence interval; RR, risk ratio.
6. Dealing with analytic approaches that yield different results
Systematic reviews, in exploring sources of heterogeneity, may sometimes find that alternative analyses (“sensitivity analyses”) yield appreciably different results. For example, a systematic review of glucosamine for treating osteoarthritis found differences in pain reduction when including only trials with concealed allocation vs. all trials [
[20]- Towheed T.
- Maxwell I.
- Abastassuades T.
- Shea B.
- Houpt J.
- Welch V.
- et al.
Glucosamine therapy for treating osteoarthritis.
]. Presenting two rows, one summarizing each analytic approach, would have left the inevitably less-equipped readers with the decision about which analysis is more credible. Rather, the authors focused on the analysis in which they had more confidence (in this case, restricted to trials with concealed allocation).
The authors did, however, note the alternative result in the “comments” column of the row in which they presented the pain results. This implies that they themselves had some uncertainty regarding which analysis was most credible, and wanted to alert readers to the alternative. Judgments of the credibility of alternative analyses require similar considerations to those of subgroup analyses, a topic we dealt with in a previous article in this series [
[6]- Guyatt G.
- Oxman A.D.
- Kunz R.
- Brozek J.
- Alonso-Coello P.
- Rind D.
- et al.
GRADE guidelines: 6. Rating the quality of evidence—imprecision.
].
7. Measures of relative effect
Options for expressing relative measures of effect include the RR (synonym: risk ratio), odds ratio (OR), rate ratio, and hazard ratio [
21Choice of effect measure for epidemiological data.
,
22Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes.
,
23- Deeks J.
- Higgins J.
- Altman D.
Analyzing data and undertaking meta-analyses.
]. ORs have advantageous statistical properties [
[24]- Eckermann S.
- Coory M.
- Willan A.R.
Consistently estimating absolute risk difference when translating evidence to jurisdictions of interest.
]. RRs, however, are more understandable intuitively, and easier to use for estimating absolute measures of effect in individual patients [
[21]Choice of effect measure for epidemiological data.
]. We find these advantages of RRs compelling (for more details, see
Box 3). Meta-analysis can generate RRs or ORs from 2
×
2 tables using appropriate statistical techniques [
22Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes.
,
23- Deeks J.
- Higgins J.
- Altman D.
Analyzing data and undertaking meta-analyses.
].
Box 3Should review authors use RRs or ORs?RRs and ORs tend (in contrast to risk differences) to be similar across risk groups. ORs have statistical properties that are superior to those of RRs, which become particularly apparent when one uses these relative measures to generate absolute effects (risk differences—see
Box 2). One is that the OR leads to the same risk difference whether one counts events in a negative or positive way, whereas RR do not. For example, RRs will yield different results in translating to risk differences if one considers mortality (e.g., 20% die) or survival (e.g., 80% survive). A second is that use of RR can generate impossible values of risk (i.e., outside of the range of 0–1.0). For instance, if one applies a RR of 1.2 from a meta-analysis to a baseline risk of 90% the result is an impossible intervention group risk of 1.08. ORs always generate risks of 0–1.0.
Conversely, as baseline risk of undesirable outcomes increases above 50% with the use of RR, the risk difference increases (as it should intuitively), whereas the risk difference using the OR decreases (counterintuitively). This is the price we pay for having the same risk difference whether one frames the issue using the desirable (e.g., survival) or undesirable (e.g., death) outcome.
Choosing either OR or RR is easily defensible. The authors of this article prefer RR because of ease of interpretation, and ease of use for generating risk differences (see
Box 2). RRs may, however, be problematic when RRs are greater than 1 and high baseline risks may occur (e.g., a baseline risk of 67% or more with a RR
>
1.5) resulting in intervention group probabilities greater than 1.0. RRs may also be problematic when positive or negative framing may be considered reasonable (e.g., death or survival when mortality over 50%; symptoms as improved or unimproved). Under these circumstances, ORs may be preferable.
Using hazard ratios requires time-to-event data and relatively complex analytic approaches [
25- Parmar M.K.
- Torri V.
- Stewart L.
Extracting summary statistics to perform meta-analyses of the published literature for survival endpoints.
,
26- Tierney J.F.
- Stewart L.A.
- Ghersi D.
- Burdett S.
- Sydes M.R.
Practical methods for incorporating summary time-to-event data into meta-analysis.
]. Time-to-event data will—at least outside of cancer studies—seldom be available for an entire group of studies that inform a particular clinical question. Moreover, hazard ratios are less familiar to clinicians (again, with the exception of clinicians focused on cancer), and are always farther from 1.0 than are RRs. Thus, clinicians familiar with RRs for a wide variety of interventions may overestimate the magnitude of effect when presented with a hazard ratio for a particular intervention.
A special case of reporting data that, in theory, can be considered continuous are counts of events per patients (e.g., the number of disease exacerbations per patient or the number of new polyps per patient in one group compared with another). When events are rare, the analyses often focus on rates. Rates relate the counts to the amount of time during which they could have happened. For example, the result of one arm of a clinical trial could be that investigators counted 20 exacerbations of chronic obstructive pulmonary disease in 100 patients during a period of 300 person-years of follow-up. The rate associated with this result would be 0.067 per person-year or 6.7 per 100 person-years. To summarize such findings, investigators use the rate ratio in meta-analyses that compare the rates of events in the two groups by dividing one by the other.
Table 6 provides an example of such a situation. When events become more frequent, investigators may treat the data as a continuous outcome.
Table 6Summary of Findings table—Presenting less common outcome measures: rate ratios and quality-of-life data
Abbreviations: CI, confidence interval; OR, odds ratio.
8. Measures of absolute effect
As we have pointed out, relative measures tend to be consistent across risk groups, whereas absolute measures do not [
22Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes.
,
27- Schmid C.H.
- Lau J.
- McIntosh M.W.
- Cappelleri J.C.
An empirical study of the effect of the control rate as a predictor of treatment efficacy in meta-analysis of clinical trials.
,
28- Furukawa T.A.
- Guyatt G.H.
- Griffith L.E.
Can we individualize the ‘number needed to treat’? An empirical study of summary effect measures in meta-analyses.
,
29- Engels E.A.
- Schmid C.H.
- Terrin N.
- Olkin I.
- Lau J.
Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses.
]. Making management choices, however, focuses on trading off absolute effects on patient-important outcomes, therefore requiring both relative and absolute measures to appear in SoF tables.
The unrepresentativeness of patients in randomized trials, and the lack of consistency of absolute measures across risk groups and across individual trials argue against direct calculation of pooled risk differences from the data in randomized trials. The alternative process begins with selection of a baseline risk (control group risk): ideally this would come from well-designed observational studies. For instance, the baseline risk for symptomatic deep venous thrombosis and for pulmonary embolism in
Table 1,
Table 4 come from a systematic review summarizing the results of observational studies [
[30]- Philbrick J.T.
- Shumate R.
- Siadaty M.S.
- Becker D.M.
Air travel and venous thromboembolism: a systematic review.
].
Box 2 shows the calculations involved in generating absolute differences from baseline risks and RRs using the outcome of venous thrombosis from
Table 1.
Using ORs provides an alternative with advantages and disadvantages (
Box 3). As a guideline developer, one may have only the OR available, or as a systematic review author, one may choose to use the OR. In either circumstance, using the OR to generate an estimate of risk difference involves converting baseline risk to odds, multiplying by the OR, and converting the resulting odds back to risks. Alternatively, one can use the following formula (where RC is the risk in the control group):
Unfortunately, high-quality observational studies are often unavailable. Typical limitations include suboptimal surveillance for outcomes, and potentially biased ascertainment of outcomes. If high-quality observational studies are not available, we suggest using the median risk (rather than the weighted average) among the control groups in the included studies or, if it is available, the control group risk from a single trial with far larger sample size than other available trials. If there is important variation in control group risks, authors should consider presenting a range of risks within that observed in the included studies (that is, present a range of baseline risks). One then applies the RR to two or more baseline risks to generate possible intervention group risks.
Absolute effects are likely to differ across patient groups. Data from observational studies (and occasionally from randomized trials) may allow reliable identification of subgroups at substantially different risk of adverse outcomes. If such data allows clinicians to readily identify these subgroups by their presenting clinical features, review authors should present absolute risks for intervention and control groups (and/or differences in risk between intervention and control groups) for each of these prognostic subgroups. Therefore, if authors find moderate- or high-quality evidence regarding clinical features that reliably distinguish between patients at substantially different risk of the outcomes of interest, they should use the baseline risk in these patient groups, along with the RR, to generate expected risks with the intervention.
Box 4 describes considerations that arise when risks differ across patient groups.
Box 4Differing risks across different patient groupsIn
Table 1, reviewers identified risk factors for asymptomatic DVT (previous episodes of DVT, coagulation disorders, severe obesity, limited mobility because of bone or joint problems, cancer, and large varicose veins) that, when considered together, more than tripled the risk of thrombosis. Applying the RR of 10% allowed calculation of expected event rates for the high- and low-risk populations using prophylactic stockings. In the low-risk population, applying the RR of 10% to the risk without the intervention of 5 per 10,000 generates a risk of 0.5 per 10,000 with the intervention. In the higher-risk population, the corresponding numbers are 18 and 1.8 per 10,000.
Table 3 presents another such example for the outcomes of venous thrombosis (three risk strata) and bleeding (two risk strata).
Reference
- [1]
Philbrick JT, Shumate R, Siadaty MS, Becker DM. Air travel and venous thromboembolism: a systematic review. J Gen Intern Med. 2007;22:107–114.
9. Presentation of absolute effects
We suggest presenting the absolute effect—both benefits and harms—as natural frequencies (events per 10,000 patients in
Table 1, although more frequent events can be presented as events per 1,000 or even per 100 patients) because this facilitates decision making [
31The psychology of good judgment: frequency formats and simple algorithms.
,
32Simple tools for understanding risks: from innumeracy to insight.
,
33- Hoffrage U.
- Gigerenzer G.
Using natural frequencies to improve diagnostic inferences.
,
34- Galesic M.
- Gigerenzer G.
- Straubinger N.
Natural frequencies help older adults and people with low numeracy to evaluate medical screening tests.
]. When events are sufficiently frequent, percentages may be as well, or marginally better, understood [
[35]- Woloshin S.
- Schwartz L.M.
Communicating data about the benefits and harms of treatment: a randomized trial.
]. Although many clinicians prefer numbers needed to treat (NNTs), they may be more difficult to interpret when it is necessary to consider multiple outcomes. Reporting NNTs may be particularly appropriate in abstracts, or in summary tables with only two to three outcomes; natural frequencies or percentages are likely to be more easily interpretable in other contexts. Review and guideline authors may want to tailor their presentations to the specific audiences they are addressing; differing formats may be optimal for differing audiences. Whatever choice is made, the presentation should be consistent across all outcomes in a single SoF table. This need for consistency also applies with regard to dealing with presentation of absolute effects when relative effects are very imprecise (
Box 5).
Box 5Presenting absolute effects when estimates of relative effect are impreciseWhen CIs around the relative risks are wide (including both benefit and large harm) providing a point estimate for the intervention that differs from that of the comparator, or a CI around a risk difference, may give the impression of an effect that does not exist. If reviewers or guideline developers share this concern, in the absolute risk difference column (or the intervention group risk column, depending on the format chosen) they may choose to state only that the result failed to show a difference between intervention and control; omit the point estimate and report only the CIs; or add a comment emphasizing the uncertainty associated with the point estimate (or some combination of the three strategies). Note that in
Table 1 for superficial vein thrombosis, we present estimates of absolute effect and include a comment that notes that the CI includes both benefit and harm. In
Table 4, which uses the same data, we do not provide absolute estimates, but merely note that the result fails to show a difference.
10. Absolute effects—confidence intervals
We further suggest reporting the CIs around the absolute risk in the intervention group (as in
Table 1,
Table 6) or around the difference between intervention and control groups (as in
Table 2,
Table 3,
Table 4,
Table 5). Just as one calculates the absolute risk in the intervention group on the basis of the absolute risk in the comparison group and the point estimate of the RR, the calculation of the CIs around the absolute risks in the intervention group is based on the absolute risk in the comparison group and the CIs around the RR. When the baseline risk is very low, however, CIs calculated on the basis of RRs may be misleading. Under these circumstances, direct calculations based on absolute risks are preferable [
[36]- Murad M.H.
- Montori V.M.
- Walter S.D.
- Guyatt G.H.
Estimating risk difference from relative association measures in meta-analysis can infrequently pose interpretational challenges.
].
RevMan provides options for calculations of RR or OR (from which one can estimate risk differences—see
Box 2 and, for ORs, text in “
Measures of absolute effect”) or, for situations when baseline risk is very low, direct calculation of risk differences.
11. Absolute effects—choice of time frame
In
Table 1, the time frame for measurement of outcome is both obvious and short—symptomatic thrombosis, if it exists, will occur within days of a long flight. For conditions such as primary and secondary prevention of cardiovascular events, or cancer recurrence, there are options for choice of the duration of follow-up. Reviewers should therefore always indicate the length of follow-up to which the estimates of absolute effect refer. Note, this length of follow-up may not be the length of follow-up in the RCTs that generated the estimates of relative effect, or the observational studies or RCTs that led to estimates of baseline risk. Rather, it will be some time frame judged appropriate for balancing the desirable and undesirable consequences of alternative management strategies.
Longer follow-up periods are associated with higher absolute risks and higher risk differences between intervention and control. This can lead to potentially important differences in readers’ perceptions of the apparent magnitude of effect (
Box 6). Often, extending the time frame involves the assumption that event rates will stay constant over time.
Box 6The impact of choice of time frame on readers’ perceptions of effectConsider primary prophylaxis with aspirin for the prevention of myocardial infarction (MI) in asymptomatic individuals with risk factors for development of coronary disease (so-called high risk). Estimates of risk of MI in such individuals—despite the high-risk label—is very low, approximately 6 per 1,000 per year [
1]. The benefits of regular use of aspirin are correspondingly low—between one and two MIs—prevented per 1,000 patients taking aspirin over the course of a year [
1]. Given that aspirin is associated with an increased risk of gastrointestinal bleeding, few would be enthusiastic about this magnitude of benefit. If one considers a time frame of a decade, however, aspirin use will prevent approximately 14 MIs per 1,000 patients (an absolute benefit of 1.4%). This latter framing potentially makes the intervention appear more attractive.
Reference
- [1]
Baigent C, Blackwell L, Collins R, Emberson J, Godwin J, Peto R, et al. Aspirin in the primary and secondary prevention of vascular disease: collaborative meta-analysis of individual participant data from randomized trials. Lancet. 2009;373(9678):1849–1860.
12. Dealing with no events in either group
When no participant in any trial has suffered the outcome of interest, the trials provide no information about relative effects (and one can thus argue that there is no point in rating the quality of the evidence). However, particularly if there are large numbers of patients, the data may provide high-quality evidence that the absolute difference between alternative management strategies is small or very small. If reviewers believe this is the appropriate inference for an important or crucial outcome, they can rate the confidence in effect estimates, and base the estimate of precision on the CI around the absolute effect (as in
Table 1,
Table 4). A program to make the calculation based on the available statistical methods [
[37]Interval estimation for the difference between independent proportions: comparison of eleven methods.
] is available from the corresponding author.
13. Uncertainty around estimates of baseline risk
Note that
Table 1 provides estimates of risk in the intervention group based on the CIs around the RR. We do not, however, provide estimates of uncertainty regarding the estimates of baseline risk in high- and low-risk control groups. Not presenting such estimates reflects a high priority on simple presentations that clinicians and patients will find easily digestible.
Potentially, all the issues that raise uncertainty about estimates of absolute effects could raise uncertainty about estimates of baseline risks: risk of bias, indirectness if surrogate measures are used, imprecision, inconsistency, and publication bias. GRADE has chosen to thus far more or less ignore uncertainty in estimates of baseline risk in its criteria for rating confidence in effect estimates. This is a pragmatic decision that avoids overwhelming complexity and keeps the systematic review manageable.
Nevertheless, guideline developers should be aware of this neglected source of uncertainty, and in certain circumstances may wish to include it in considerations about confidence in effect estimates for individual outcomes. When such considerations arise, we suggest classifying them under “indirectness.” Presenting a plausible range of baseline risks may, to some extent, ameliorate the problem.
14. What to do when there is no published evidence regarding an important outcome
We encourage systematic review authors and guideline developers to specify all important outcomes before commencing their reviews. If they do so, it is possible that they may find no published evidence regarding one or more outcomes (quality of life and rare side effects are two outcomes that may be subject to this problem). We suggest that if sufficiently important, such an outcome would warrant a row in the SoF table, with the confidence in effect estimates rating (and other cells aside from the comments) being either left blank or classified as very low-quality evidence.
15. Conclusion
The SoF table provides all the key information necessary for making decisions between competing health care management strategies [
[38]- Djulbegovic B.
- Soares H.
- Kumar A.
What kind of evidence do patients and practitioners need: evidence profiles based on 5 key evidence-based principles to summarize data on benefits and harms.
]. Therefore, although not an absolute requirement for use of the GRADE approach, the SoF table is an invaluable tool for providing a succinct, accessible, transparent evidence summary for patients, health care providers, and policy makers.
Article info
Publication history
Published online: May 21, 2012
Accepted:
January 30,
2012
Footnotes
The GRADE system has been developed by the GRADE Working Group. The named authors drafted and revised this article. A complete list of contributors to this series can be found on the journal’s Web site at www.elsevier.com.
Copyright
© 2013 Elsevier Inc. Published by Elsevier Inc. All rights reserved.