Advertisement

GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias)

      Abstract

      In the GRADE approach, randomized trials start as high-quality evidence and observational studies as low-quality evidence, but both can be rated down if most of the relevant evidence comes from studies that suffer from a high risk of bias. Well-established limitations of randomized trials include failure to conceal allocation, failure to blind, loss to follow-up, and failure to appropriately consider the intention-to-treat principle. More recently recognized limitations include stopping early for apparent benefit and selective reporting of outcomes according to the results. Key limitations of observational studies include use of inappropriate controls and failure to adequately adjust for prognostic imbalance. Risk of bias may vary across outcomes (e.g., loss to follow-up may be far less for all-cause mortality than for quality of life), a consideration that many systematic reviews ignore. In deciding whether to rate down for risk of bias—whether for randomized trials or observational studies—authors should not take an approach that averages across studies. Rather, for any individual outcome, when there are some studies with a high risk, and some with a low risk of bias, they should consider including only the studies with a lower risk of bias.

      Keywords

      1. Introduction

      Key points
      • In the GRADE approach, both randomized trials (which start as high quality evidence) and observational studies (which start as low quality evidence) can be rated down if relevant evidence comes from studies that suffer from a high risk of bias.
      • Risk of bias can differ across outcomes when, for instance, each outcome is informed by a different subset of studies (e.g. mortality from some trials, quality of life from others).
      • Current systematic reviews are often limited in their usefulness for guidelines because they rate risk of bias by studies across outcomes rather than by outcome across studies.
      In three previous articles in our series describing the GRADE system of rating the quality of evidence and grading the strength of recommendations, we have described the process of framing the question and introduced GRADE’s approach to rating the quality of evidence. This fourth article deals with one of the five categories of reasons for rating down the quality of evidence, study limitations (risk of bias).

      2. Rating down quality for risk of bias

      Both randomized controlled trials (RCTs) and observational studies may incur additional risk of misleading results if they are flawed in their design or conduct—what other publications refer to as problems with “validity” or “internal validity” and we label “study limitations” or “risk of bias.”

      3. Study limitations in randomized trials

      Readers can refer to many authoritative discussions of the study limitations that often afflict RCTs (Table 1). Two of these discussions are particularly consistent with GRADE’s conceptualization, which include a focus on outcome specificity (i.e., the focus of risk of bias is not the individual study but rather the individual outcome, and quality can differ across outcomes in individual trials, or a series of trials [
      The users’ guides to the medical literature: a manual for evidence-based clinical practice.
      ,
      • Higgins J.P.
      • Altman D.
      Assessing the risk of bias in included studies.
      ]). We shall highlight three of the criteria in Table 1. The importance of the first of these, stopping early for benefit, has only recently been recognized. Recent evidence has also emerged regarding the second, selective outcome reporting [
      • Vedula S.S.
      • Bero L.
      • Scherer R.W.
      • Dickersin K.
      Outcome reporting in industry-sponsored trials of gabapentin for off-label use.
      ,
      • Mathieu S.
      • Boutron I.
      • Moher D.
      • Altman D.G.
      • Ravaud P.
      Comparison of registered and published primary outcomes in randomized controlled trials.
      ]. Furthermore, the positioning of selective outcome reporting in taxonomies of bias can be confusing. Some may intuitively think it should be categorized with publication bias, rather than as an issue of risk of bias within individual studies. Finally, we highlight loss to follow-up because it is often misunderstood.
      Table 1Study limitations in randomized trials
      1. Lack of allocation concealment
       Those enrolling patients are aware of the group (or period in a crossover trial) to which the next enrolled patient will be allocated (major problem in “pseudo” or “quasi” randomized trials with allocation by day of week, birth date, chart number, etc)
      2. Lack of blinding
       Patient, care givers, those recording outcomes, those adjudicating outcomes, or data analysts are aware of the arm to which patients are allocated (or the medication currently being received in a crossover trial)
      3. Incomplete accounting of patients and outcome events
       Loss to follow-up and failure to adhere to the intention-to-treat principle in superiority trials; or in noninferiority trials, loss to follow-up, and failure to conduct both analyses considering only those who adhered to treatment, and all patients for whom outcome data are available
      4. Selective outcome reporting bias
       Incomplete or absent reporting of some outcomes and not others on the basis of the results
      5. Other limitations
       Stopping early for benefit
       Use of unvalidated outcome measures (e.g., patient-reported outcomes)
       Carryover effects in crossover trial
       Recruitment bias in cluster-randomized trials
      Before we do so, however, we note one additional issue. Recent evidence suggests that bias associated with lack of blinding and lack of concealment may be greater in trials with subjective outcomes [
      • Wood L.
      • Egger M.
      • Gluud L.L.
      • Schulz K.F.
      • Jüni P.
      • Altman D.G.
      • et al.
      Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study.
      ]. Systematic review authors and guideline developers should consider this evidence when making decisions about rating down quality for risk of bias.

      4. Stopping early for benefit

      Theoretical consideration [
      • Pocock S.J.
      When (not) to stop a clinical trial for benefit.
      ], simulations [
      • Pocock S.J.
      • Hughes M.D.
      Practical problems in interim analyses, with particular regard to estimation.
      ], and empirical evidence [
      • Montori V.M.
      • Devereaux P.J.
      • Adhikari N.K.
      • Burns K.E.
      • Eggert C.H.
      • Briel M.
      • et al.
      Randomized trials stopped early for benefit: a systematic review.
      ] all suggest that trials stopped early for benefit overestimate treatment effects. The most recent empirical work suggests that in the real world, formal stopping rules do not reduce this bias, that it is evident in stopped early trials with less than 500 events and that on average the ratio of relative risks in trials stopped early vs. the best estimate of the truth (trials not stopped early) is 0.71 [
      • Bassler D.
      • Briel M.
      • Montori V.M.
      • Lane M.
      • Glasziou P.
      • Zhou Q.
      • et al.
      Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis.
      ].
      Because in most cases the major contributor to the overestimation of treatment effects in trials stopped early for benefit is chance, including stopping early as a source of bias is questionable. Nevertheless, the presence of stopped early trials, particularly when they contribute substantial weight in a meta-analysis, should alert systematic review authors and guideline developers to the possibility of a substantial overestimate of treatment effect. Systematic reviews should provide sensitivity analyses of results including and excluding studies that stopped early for benefit; if estimates differ appreciably, those restricted to the trials that did not stop early should be considered the more credible. When evidence comes primarily or exclusively from trials stopped early for benefit, authors should infer that substantial overestimates are likely in trials with fewer than 500 events and that large overestimates are likely in trials with fewer than 200 events [
      • Bassler D.
      • Briel M.
      • Montori V.M.
      • Lane M.
      • Glasziou P.
      • Zhou Q.
      • et al.
      Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis.
      ].

      5. Selective outcome reporting

      When authors or study sponsors selectively report positive outcomes and analyses within a trial, critics have used the label “selective outcome reporting.” Recent evidence suggests that selective outcome reporting, which tends to produce overestimates of the intervention effects, may be widespread [
      • Mathieu S.
      • Boutron I.
      • Moher D.
      • Altman D.G.
      • Ravaud P.
      Comparison of registered and published primary outcomes in randomized controlled trials.
      ,
      • Furukawa T.A.
      • Watanabe N.
      • Omori I.M.
      • Montori V.M.
      • Guyatt G.H.
      Association between unreported outcomes and effect size estimates in Cochrane meta-analyses.
      ,
      • Chan A.W.
      • Altman D.G.
      Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors.
      ,
      • Chan A.W.
      • Hróbjartsson A.
      • Haahr M.T.
      • Gøtzsche P.C.
      • Altman D.G.
      Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles.
      ,
      • Chan A.W.
      • Krleza-Jerić K.
      • Schmid I.
      • Altman D.G.
      Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research.
      ].
      For example, a systematic review of the effects of testosterone on erection satisfaction in men with low testosterone identified four eligible trials [
      • Bolona E.R.
      • Uraga M.V.
      • Haddad R.M.
      • Tracz M.J.
      • Sideras K.
      • Kennedy C.C.
      • et al.
      Testosterone use in men with sexual dysfunction: a systematic review and meta-analysis of randomized placebo-controlled trials.
      ]. The largest trial’s results were reported only as “not significant” and could not, therefore, contribute to the meta-analysis. Data from the three smaller trials suggested a large treatment effect (1.3 standard deviations, 95% confidence interval 0.2, 2.3). The review authors ultimately obtained the complete data from the larger trial: after including the less impressive results of the large trial, the magnitude of the effect was smaller and no longer statistically significant (0.8 standard deviations, 95% confidence interval −0.05, 1.63) [
      • Sinha M.
      • Montori V.M.
      Reporting bias and other biases affecting systematic reviews and meta-analyses: a methodological commentary.
      ].
      The Cochrane handbook suggests that definitive evidence that selective reporting has not occurred requires access to a protocol developed before the study was undertaken [
      • Higgins J.P.
      • Altman D.
      Assessing the risk of bias in included studies.
      ]. Selective reporting is present if authors acknowledge prespecified outcomes that they fail to report or report outcomes incompletely such that they cannot be included in a meta-analysis. One should suspect reporting bias if the study report fails to include results for a key outcome that one would expect to see in such a study or if composite outcomes are presented without the individual component outcomes.
      Note that within the GRADE framework, which rates the quality of a body of evidence, suspicion of selective reporting bias in a number of included studies may lead to rating down of quality of the body of evidence. For instance, in the testosterone example above, had the authors not obtained the missing data, they would have considered rating down the body of evidence for the selective reporting bias suspected in the largest study.

      6. Loss to follow-up

      Historically, methodologists have sometimes suggested arbitrary thresholds for acceptable loss to follow-up (e.g., less than 20%). The significance of particular rates of loss to follow-up, however, varies widely and is dependent on the relation between loss to follow-up and number of events. For instance, loss to follow-up of 5% in both intervention and control groups would entail little threat of bias if event rates were 20% and 40% in intervention and control groups, respectively. If event rates were 2% and 4%, however, concern with 5% loss to follow-up is much greater.
      To state this as a general rule, the higher the proportion lost to follow-up in relation to intervention and control event rates, and differences between intervention and control groups, the greater the threat of bias. Even with relatively high rates of loss to follow-up, however, bias will result only if the number lost is imbalanced between groups or the relationship between loss to follow-up and the likelihood of events differs between intervention and control groups. Unfortunately, we never know if the relationship between loss to follow-up and the likelihood of events does or does not differ in intervention and control groups; large loss to follow-up in relation to the number of events always, therefore, raises the issue of a serious threat of bias.
      The issue is conceptually identical with continuous outcomes: Was the loss to follow-up such that reasonable assumptions about differences in outcomes among those lost to follow-up in intervention and control groups could change the overall results in an important way? One can test a variety of assumptions about rates of events in those lost to follow-up when the outcome is a binary variable. One can also conduct such sensitivity analyses when the data are continuous, although the statistical modeling is more challenging.

      7. Study limitations in observational studies

      Systematic reviews of tools to assess the methodological quality of nonrandomized studies have identified more than 200 checklists and instruments [
      • Deeks J.J.
      • Dinnes J.
      • D’Amico R.
      • Sowden A.J.
      • Sakarovitch C.
      • Song F.
      • et al.
      Evaluating non-randomised intervention studies.
      ,
      • West S.
      • King V.
      • Carey T.S.
      • Lohr K.N.
      • Mckoy N.
      • Sutton S.F.
      • et al.
      Systems to rate the strength of scientific evidence [evidence report/technology assessment no 47]. AHRQ Publication No 02–E016.
      ,
      Proposed Evaluation Tools for COMPUS: assessment, November 29, 2005.
      ,
      • Sanderson S.
      • Tatt I.D.
      • Higgins J.P.
      Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography.
      ]. Table 2 summarizes key criteria for observational studies that reflect the contents of these checklists. Judgments associated with assessing study limitations in observational studies are often complex; here, we address two key issues that arise in assessing risk of bias.
      Table 2Study limitations in observational studies
      1. Failure to develop and apply appropriate eligibility criteria (inclusion of control population)
       Under- or overmatching in case–control studies
       Selection of exposed and unexposed in cohort studies from different populations
      2. Flawed measurement of both exposure and outcome
       Differences in measurement of exposure (e.g., recall bias in case–control studies)
       Differential surveillance for outcome in exposed and unexposed in cohort studies
      3. Failure to adequately control confounding
       Failure of accurate measurement of all known prognostic factors
       Failure to match for prognostic factors and/or lack of adjustment in statistical analysis
      4. Incomplete follow-up

      7.1 Case series: the problem of missing internal controls

      Ideally, observational studies will choose contemporaneous comparison groups that, as far as possible, differ from intervention groups only in the decision (typically by patient or clinician) not to use the intervention. Researchers will enroll and observe intervention and comparison group patients in identical ways. This is the prototypical design using what might be called “internal controls”—internal, that is, to the study under conduct.
      An alternative approach is to study only patients exposed to the intervention—a design we refer to as a case series (others may use “single group cohort”). To make inferences regarding intervention effects, case series must still refer to results in a comparison group. In many case series, however, the source of comparison group results is implicit or unclear. Such vagueness raises serious questions about the prognostic similarity of intervention and comparison groups and will usually warrant rating down from low- to very low-quality evidence. For instance, in considering the relative impact of low–molecular weight heparin vs. unfractionated heparin in pregnant women, we find systematic reviews of the incidence of bleeding in women receiving the former agent [
      • Greer I.A.
      • Nelson-Piercy C.
      Low-molecular-weight heparins for thromboprophylaxis and treatment of venous thromboembolism in pregnancy: a systematic review of safety and efficacy.
      ,
      • Sanson B.J.
      • Lensing A.W.
      • Prins M.H.
      • Ginsberg J.S.
      • Barkagan Z.S.
      • Lavenne-Pardonge E.
      • et al.
      Safety of low-molecular-weight heparin in pregnancy: a systematic review.
      ] but no direct comparisons with the latter.
      Thus, case series typically yield very low-quality evidence. There are, however, exceptions. Consider the question of the impact of routine colonoscopy vs. no screening for colon cancer on the rate of perforation associated with colonoscopy. Here, a large series of representative patients undergoing colonoscopy will provide high-quality evidence. When control rates are near zero, case series of representative patients (one might call these cohort studies) can provide high-quality evidence of adverse effects associated with an intervention. One should not confuse these with isolated case reports of associations between exposures and rare adverse outcomes (as have, for instance, been reported with vaccine exposure).

      7.2 Dealing with prognostic imbalance

      Observational studies are at risk of bias because of differences in prognosis in exposed and unexposed populations; to the extent that the two groups come from the same time, place, and population, this risk of bias is diminished. Nevertheless, prognostic imbalance threatens the validity of all observational studies. If the available studies have failed to measure known important prognostic factors, have measured them badly, or have failed to take these factors into account in their analysis (by matching or statistical adjustment), review authors and guideline developers should consider rating down the quality of the evidence from low to very low.
      For example, a cohort study using a large administrative database demonstrated an increased risk of cancer-related mortality in diabetic patients using sulfonylureas or insulin relative to metformin [
      • Bowker S.L.
      • Majumdar S.R.
      • Veugelers P.
      • Johnson J.A.
      Increased cancer-related mortality for patients with type 2 diabetes who use sulfonylureas or insulin.
      ]. The investigators did not have data available and could, therefore, not adjust for key prognostic variables, including smoking, family history of cancer, occupational exposure, dietary history, and exposure to pollutants. Thus, the study—and others like it that fail to adjust for key prognostic variables—provides only very low-quality evidence of a causal relation between the hypoglycemic agent and cancer deaths.

      8. Limitations of GRADE’s approach to assessing risk of bias in individual studies

      GRADE’s approach to assessing risk of bias shares two fundamental limitations with the very large number of alternative approaches. First, empirical evidence supporting the criteria is limited—attempts to show systematic difference between studies that meet and do not meet specific criteria have shown inconsistent results. Second, the relative weight one should put on the criteria remains uncertain.
      The GRADE approach is less comprehensive than many systems, emphasizing simplicity and parsimony over completeness. GRADE’s approach does not provide a quantitative rating of risk of bias. Although such a rating has advantages, we share with the Cochrane Collaboration methodologists a reluctance to provide a risk of bias score that, by its nature, must make questionable assumptions about the relative extent of bias associated with individual items and fails to consider the context of the individual items.

      9. Summarizing study limitations must be outcome specific

      Sources of bias may vary in importance across outcomes. Thus, within a single study, one may have higher quality evidence for one outcome than for another. For instance, RCTs of steroids for acute spinal cord injury measured both all-cause mortality and, based on a detailed physical examination, motor function [
      • Bracken M.B.
      • Shepard M.J.
      • Collins Jr., W.F.
      • Holford T.R.
      • Baskin D.S.
      • Eisenberg H.M.
      • et al.
      Methylprednisolone or naloxone treatment after acute spinal cord injury: 1-year follow-up data. Results of the second National Acute Spinal Cord Injury Study.
      ,
      • Otani K.
      • Abe H.
      • Kadoya S.
      Beneficial effect of methylprednisolone sodium succinate in the treatment of acute spinal cord injury.
      ,
      • Pointillart V.
      • Petitjean M.E.
      • Wiart L.
      • Vital J.M.
      • Lassié P.
      • Thicoipé M.
      • et al.
      Pharmacological therapy of spinal cord injury during the acute phase.
      ]. Blinding of outcome assessors is irrelevant for mortality but crucial for motor function. Thus, as in this example, if the outcome assessors in the primary studies on which a guideline panel relies were not blinded, the panel might categorize evidence for all-cause mortality as having no serious study limitations and rate down the evidence for motor function by one level on the basis of serious study limitations.

      10. Summarizing risk of bias requires consideration of all relevant evidence

      Every study addressing a particular outcome will differ, to some degree, in risk of bias. Review authors and guideline developers must make an overall judgment, considering all the evidence, whether quality of evidence for an outcome warrants rating down on the basis of study limitations.
      Table 3 presents the structure of GRADE’s approach to study limitations in RCTs. The second column in Table 3 presents the approach as applied to individual studies; the remaining columns refer to the entire body of evidence. Individual trials achieve a low risk of bias when most or all key criteria are met and any violations are not crucial. Studies that suffer from one crucial violation—a violation of crucial importance with regard to a point estimate (in the context of a systematic review) or decision (in the context of a guideline)—provide limited-quality evidence. When one or more crucial limitations substantially lower confidence in a point estimate, a body of evidence provides only very limited support for inferences regarding the magnitude of a treatment effect.
      Table 3Summarizing study limitations for randomized trials
      Extent of risk of biasRisk of bias within a studyRisk of bias across studiesInterpretation across studies
      This interpretation assumes no problems that necessitate rating down because of imprecision, inconsistency, indirectness, and publication bias.
      Example of summary across studies
      No serious limitations, do not downgradeLow risk of bias for all key criteria (Table 1)Most information is from studies at low risk of biasHigh-quality evidence: the true effect lies close to that of the estimate of the effectBeta-blockers reduce mortality in patients with heart failure
      • Brophy J.M.
      • Joseph L.
      • Rouleau J.L.
      Beta-blockers in congestive heart failure. A Bayesian meta-analysis.
      Serious limitations, rate down one level (i.e., from high to moderate quality)Crucial limitation for one criterion or some limitations for multiple criteria sufficient to lower ones confidence in the estimate of effectMost information is from studies at moderate risk of biasQuality of evidence reduced from high- to moderate-quality evidence: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially differentAmodiaquine and SP together likely reduce treatment failures compared with SP alone in patients with malaria
      • McIntosh H.
      • Jones K.
      Chloroquine or amodiaquine combined with sulfadoxine-pyrimethamine for treating uncomplicated malaria.
      Very serious limitations rate down two levels (i.e., from high to low quality or moderate to very low)Crucial limitation for one or more criteria sufficient to substantially lower ones confidence in the estimate of effectMost information is from studies at high risk of biasQuality of evidence reduced from high- to low-quality evidence: the true effect may be substantially different from the estimate of the effectOpen discectomy may reduce symptoms after 1 yr compared with conservative treatment of lumbar disc prolapse
      • Gibson J.
      • Waddell G.
      Surgical interventions for lumbar disc prolapse.
      Abbreviation: SP, sulfadoxine-pyrimethamine.
      a This interpretation assumes no problems that necessitate rating down because of imprecision, inconsistency, indirectness, and publication bias.
      Table 3 illustrates that high-quality evidence is available when most studies from a body of evidence meet bias-minimizing criteria. For example, of the 22 trials addressing the impact of beta-blockers on mortality in patients with heart failure, most, probably or certainly, used concealed allocation, all blinded at least some key groups, and follow up of randomized patients was almost complete [
      • Brophy J.M.
      • Joseph L.
      • Rouleau J.L.
      Beta-blockers in congestive heart failure. A Bayesian meta-analysis.
      ].
      GRADE considers a body of evidence of moderate quality when the best evidence comes from individual studies of moderate quality. For instance, we cannot be confident that, in patients with falciparum malaria, amodiaquine and sulfadoxine-pyrimethamine together reduce treatment failures compared with sulfadoxine-pyrimethamine alone because the apparent advantage of sulfadoxine-pyrimethamine was sensitive to assumptions regarding the event rate in those lost to follow-up in two of three studies [
      • McIntosh H.
      • Jones K.
      Chloroquine or amodiaquine combined with sulfadoxine-pyrimethamine for treating uncomplicated malaria.
      ].
      Surgery vs. conservative treatment in the management of patients with lumbar disc prolapse provides an example of rating down two levels because of risk of bias in RCTs [
      • Gibson J.
      • Waddell G.
      Surgical interventions for lumbar disc prolapse.
      ]. We are uncertain of the benefit of open disectomy in reducing symptoms after 1 year or longer because of very serious limitations in one trial of open disectomy compared with conservative treatment without a large number of early crossovers in both comparison groups. That trial suffered from inadequate concealment of allocation and unblinded assessment of outcome by potentially biased raters (surgeons) using unvalidated rating instruments (Table 4).
      Table 4Quality assessment for open discectomy vs. conservative treatment (Gibson and Waddell
      • Gibson J.
      • Waddell G.
      Surgical interventions for lumbar disc prolapse.
      )
      Quality assessment
      No of patients (studies)DesignLimitationsInconsistencyIndirectnessImprecisionPublication bias
      Outcome: poor/bad result at 1 yr—surgeon rated
      126 (1)RCTVery serious limitations
      Inadequate concealment of allocation and unblinded unvalidated assessment by the surgeon.
      Not relevantNo serious indirectnessSerious imprecision
      Wide confidence intervals and few events (16 or fewer).
      Unlikely
      Outcome: poor/bad result at 4 yr—surgeon rated
      126 (1)RCTVery serious limitations
      Inadequate concealment of allocation and unblinded unvalidated assessment by the surgeon.
      Not relevantNo serious indirectnessSerious imprecision
      Wide confidence intervals and few events (16 or fewer).
      Unlikely
      Outcome: poor/bad result at 10 yr—surgeon rated
      126 (1)RCTVery serious limitations
      Inadequate concealment of allocation and unblinded unvalidated assessment by the surgeon.
      Not relevantNo serious indirectnessSerious imprecision
      Wide confidence intervals and few events (16 or fewer).
      Unlikely
      Abbreviation: RCT, randomized controlled trial.
      a Inadequate concealment of allocation and unblinded unvalidated assessment by the surgeon.
      b Wide confidence intervals and few events (16 or fewer).

      11. Existing systematic reviews are often limited in summarizing study limitations across studies

      To rate overall quality of evidence with respect to an outcome, review authors and guideline developers must consider and summarize study limitations considering all the evidence from multiple studies. For a guideline developer, using an existing systematic review would be the most efficient way to address this issue.
      Unfortunately, systematic reviews usually do not address all important outcomes, typically focusing on benefit and neglecting harm. For instance, one is required to go to separate reviews to assess the impact of beta-blockers on mortality [
      • Brophy J.M.
      • Joseph L.
      • Rouleau J.L.
      Beta-blockers in congestive heart failure. A Bayesian meta-analysis.
      ] and on quality of life [
      • Dobre D.
      • van Jaarsveld C.H.
      • deJongste M.J.
      • Haaijer Ruskamp F.M.
      • Ranchor A.V.
      The effect of beta-blocker therapy on quality of life in heart failure patients: a systematic review and meta-analysis.
      ]. No systematic review has addressed beta-blocker toxicity in heart failure patients.
      Review authors’ usual practice of rating the quality of studies across outcomes, rather than separately for each outcome, further limits the usefulness of existing systematic reviews for guideline developers. This approach becomes even more problematic when review authors use summary measures that aggregate across quality criteria (e.g., allocation concealment, blinding, loss to follow-up) to provide a single score. These measures are often limited in that they focus on quality of reporting rather than on the design and conduct of the study [
      • Moher D.
      • Jadad A.R.
      • Nichol G.
      • Penman M.
      • Tugwell P.
      • Walsh S.
      Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists.
      ]. Furthermore, they tend to be unreliable and less closely correlated with outcome than individual quality components [
      • Schulz K.F.
      • Chalmers I.
      • Hayes R.J.
      • Altman D.G.
      Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials.
      ,
      • Emerson J.D.
      • Burdick E.
      • Hoaglin D.C.
      • Mosteller F.
      • Chalmers T.C.
      An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials.
      ,
      • Juni P.
      • Witschi A.
      • Bloch R.
      • Egger M.
      The hazards of scoring the quality of clinical trials for meta-analysis.
      ]. These problems arise, at least in part, because calculating a summary score inevitably involves assigning arbitrary weights to different criteria.
      Finally, systematic reviews that address individual components of study limitations are often not comprehensive and fail to make transparent the judgments needed to evaluate study limitations. These judgments are often challenging, at least in part, because of inadequate reporting: just because a safeguard against bias is not reported does not mean it was neglected [
      • Devereaux P.J.
      • Choi P.T.
      • El-Dika S.
      • Bhandari M.
      • Montori V.M.
      • Schünemann H.J.
      • et al.
      An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods.
      ,
      • Soares H.P.
      • Daniels S.
      • Kumar A.
      • Clarke M.
      • Scott C.
      • Swann S.
      • Djulbegovic B.
      • et al.
      Bad reporting does not mean bad methods for randomised trials: observational study of randomised controlled trials performed by the Radiation Therapy Oncology Group.
      ].
      Thus, although systematic reviews are often extremely useful in identifying the relevant primary studies, members of guideline panels or their delegates must often review individual studies if they wish to ensure accurate ratings of study limitations for all relevant outcomes. As review authors increasingly adopt the GRADE approach (and in particular as Cochrane review authors do so in combination with using the Cochrane risk of bias tool), the situation will improve.

      12. What to do when there is only one RCT

      Many people are uncomfortable designating a single RCT as high-quality evidence. Given the many instances in which the first positive report has not held up under subsequent investigation, this discomfort is warranted. On the other hand, automatically rating down quality when there is a single study is not appropriate. A single, very large, rigorously planned and conducted multicentre RCT may provide high-quality evidence. GRADE suggests especially careful scrutiny of all relevant issues (risk of bias, precision, directness, and publication bias) when only a single RCT addresses a particular question.

      13. Moving from Cochrane risk of bias tables in individual studies to rating quality of evidence across studies

      Moving from 6 risk of bias criteria for each individual study to a judgment about rating down for quality of evidence for risk of bias across a group of studies addressing a particular outcome presents challenges. We suggest the following principles.
      First, in deciding on the overall quality of evidence, one does not average across studies (for instance if some studies have no serious limitations, some serious limitations, and some very serious limitations, one does not automatically rate quality down by one level because of an average rating of serious limitations). Rather, judicious consideration of the contribution of each study, with a general guide to focus on the high-quality studies (as we will illustrate), is warranted.
      Second, this judicious consideration requires evaluating the extent to which each trial contributes toward the estimate of magnitude of effect. This contribution will usually reflect study sample size and number of outcome events—larger trials with many events will contribute more, much larger trials with many more events will contribute much more.
      Third, one should be conservative in the judgment of rating down. That is, one should be confident that there is substantial risk of bias across most of the body of available evidence before one rates down for risk of bias.
      Fourth, the risk of bias should be considered in the context of other limitations. If, for instance, reviewers find themselves in a close-call situation with respect to two quality issues (risk of bias and, say, precision), we suggest rating down for at least one of the two.
      Fifth, notwithstanding the first four principles, reviewers will face close-call situations. They should both acknowledge that they are in such a situation, make it explicit why they think this is the case, and make the reasons for their ultimate judgment apparent.

      14. Application of principles

      A systematic review of flavonoids to treat pain and bleeding associated with hemorrhoids [
      • Alonso-Coello P.
      • Zhou Q.
      • Martinez-Zapata M.J.
      • Mills E.
      • Heels-Ansdell D.
      • Johanson J.F.
      • et al.
      Meta-analysis of flavonoids for the treatment of haemorrhoids.
      ], with respect to the primary outcome of persisting symptoms, most trials did not provide sufficient information to determine whether randomization was concealed, the majority violated the intention-to-treat principle and did not provide the data allowing the appropriate analysis (Table 5), and none used a validated symptom measure. On the other hand, most authors described their trials as double blind, and although concealment and blinding are different concepts, blinded trials of drugs are very likely to be concealed [
      • Devereaux P.J.
      • Choi P.T.
      • El-Dika S.
      • Bhandari M.
      • Montori V.M.
      • Schünemann H.J.
      • et al.
      An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods.
      ] (Table 5). Because the questionnaires appeared simple and transparent, and because of the blinding of the studies, we would be hesitant to consider lack of validation introducing a serious risk of bias.
      Table 5Risk of bias for measurement of symptoms in studies of flavonoids in patients with hemorrhoids
      Study
      For full citation of the references cited in this table, see Alonso-Coello et al.[36]
      RandomizationAllocation concealmentBlindingLoss to follow-up
      No important differences in rate of loss to follow-up between flavonoid and control groups in any study.
      /IT principle observed or per protocol analysis
      Other
      Dimitroulopoulos D, 2005
      • Adequate
        Data provided by authors.
      • Computer-generated random numbers
        Data provided by authors.
      • Sealed opaque envelopes
        Data provided by authors.
      • Described as single blind
      • Care givers, patients, and data collectors blinded
        Data provided by authors.
      3%/protocolUnvalidated symptom measure
      Misra MC, 2000
      • Adequate
      • Computer-generated random numbers
        Data provided by authors.
      • Adequate
      • Sealed opaque envelopes
        Data provided by authors.
      • Patients and physicians
        Data provided by authors.
      • Described as double blind
      • Placebo identical appearance
      2%/protocolUnvalidated symptom measure
      Godeberge P, 1994
      • Adequate
        Data provided by authors.
      • Adequate
      • Sealed opaque envelopes
        Data provided by authors.
      • Patients, physician-investigator, data manager, statistician, and authors blinded
      6%/protocol
      Cospite M, 1994
      • Unclear
      • Unclear
      • Unclear
      • Described as double blind
      12%/ITUnvalidated symptom measure
      Chauvenet-M, 1994
      • Unclear
      • Unclear
      • Unclear
      11%/protocolUnvalidated symptom measure
      Ho Y-H, 2000
      • Adequate
      • Drawing of sealed opaque envelopes
        Data provided by authors.
      • Adequate
      • Sealed opaque envelopes
      • All parties blinded
        Data provided by authors.
      0%/ITUnvalidated symptom measure
      Thanapongsathorn W, 1992
      • Unclear
      • Unclear
      • Unclear
      • Described as double blind
      I2%/protocolUnvalidated symptom measure
      Titapant V, 2001
      • Unclear
      • Unclear
      • Unclear
      • Described as double blind
      • Placebo identical appearance
      12%/protocolUnvalidated symptom measure
      Wijayanegara H, 1992
      • Unclear
      • Unclear
      • Unclear
      • Described as double blind
      3%/protocolUnvalidated symptom measure
      Annoni F, 1986
      • Unclear
      • Unclear
      • Unclear
      • Described as double blind
      • Placebo identical appearance
      Uncertain/unclearUnvalidated symptom measure
      Thorp RH, 1970
      • Unclear
      • Unclear
      • Physicians and patients blinded
      • Described as double blind
      • Placebo identical appearance
      20%/protocolUnvalidated symptom measure
      Clyne MB, 1967
      • Bottles numbered consecutively in accordance to random tables
      • Unclear
      • Physicians and patients blinded
      • Described as double blind
      • Placebo identical appearance
      Uncertain/protocolUnvalidated symptom measure
      Sinnatamby CS, 1973
      • Unclear
      • Unclear
      • Physicians and patients blinded
      • Described as double blind
      53%/protocolUnvalidated symptom measure
      Trochet JP, 1992
      • Randomized by blocks of three (method unclear)
      • Unclear
      • Physicians blinded
      • Placebo identical appearance
      Uncertain/ITUnvalidated symptom measure
      Abbreviation: IT, intention-to-treat principle observed.
      a No important differences in rate of loss to follow-up between flavonoid and control groups in any study.
      b Data provided by authors.
      c For full citation of the references cited in this table, see Alonso-Coello et al.
      • Alonso-Coello P.
      • Zhou Q.
      • Martinez-Zapata M.J.
      • Mills E.
      • Heels-Ansdell D.
      • Johanson J.F.
      • et al.
      Meta-analysis of flavonoids for the treatment of haemorrhoids.
      Nevertheless, in light of these study limitations, one might consider focusing on the highest quality trials. Substantial precision would, however, be lost (requiring rating down for imprecision), and the quality of the trials did not explain variability in results (i.e., the magnitude of effect was similar in the methodologically stronger and weaker studies). Both considerations argue for basing an estimate on the results of all RCTs.
      In our view, this represents a borderline situation in which it would be reasonable either to rate down for risk of bias or not to do so. This illustrates that the great merit of GRADE is not that it ensures consistency of conclusions but that it requires explicit and transparent judgments. Considering these issues in isolation, and following the principles articulated above, however, we would be inclined not to rate down for quality for risk of bias.
      The possibility of discrepant judgments between intelligent and well-informed review authors is more than theoretical. A number of RCTs have evaluated the extent to which graduated pressure stockings can prevent deep venous thrombosis (DVT) in airline passengers taking long flights. Cochrane review authors concluded that the studies provided high-quality evidence for DVT prevention [
      • Clarke M.
      • Hopewell S.
      • Juszczak E.
      • Eisinga A.
      • Kjeldstrøm M.
      Compression stockings for preventing deep vein thrombosis in airline passengers.
      ]. In contrast, a group of thrombosis experts involved in producing a guideline concluded that because of use of an unreliable method of diagnosing DVT, and lack of blinding, the evidence was of low quality [
      • Geerts W.H.
      • Bergqvist D.
      • Pineo G.F.
      • Heit J.A.
      • Samama C.M.
      • Lassen M.R.
      • et al.
      Prevention of venous thromboembolism: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th Edition).
      ]. Even after direct contact and discussion, each group adhered to its own position—and it remains possible that either group is correct.
      Three RCTs addressing the impact of 24-hour administration of high-dose corticosteroids on motor function in patients with acute spinal cord injury illustrate another principle of aggregation [
      • Bracken M.B.
      • Shepard M.J.
      • Collins Jr., W.F.
      • Holford T.R.
      • Baskin D.S.
      • Eisenberg H.M.
      • et al.
      Methylprednisolone or naloxone treatment after acute spinal cord injury: 1-year follow-up data. Results of the second National Acute Spinal Cord Injury Study.
      ,
      • Otani K.
      • Abe H.
      • Kadoya S.
      Beneficial effect of methylprednisolone sodium succinate in the treatment of acute spinal cord injury.
      ,
      • Pointillart V.
      • Petitjean M.E.
      • Wiart L.
      • Vital J.M.
      • Lassié P.
      • Thicoipé M.
      • et al.
      Pharmacological therapy of spinal cord injury during the acute phase.
      ]. Although the degree of limitations is in fact a continuum (as Fig. 1 illustrates), GRADE simplifies the process by categorizing these studies—or any other study—as having “no serious limitations,” “serious limitations,” or “very serious limitations” (as in Table 3).
      Figure thumbnail gr1
      Fig. 1Validity of three randomized controlled trials addressing the effect of steroids on motor function in acute spinal cord injury.
      The first of the three trials (Bracken in Fig. 1), which included 127 patients treated within 8 hours of injury, ensured allocation concealment through central randomization, almost certainly blinded patients, clinicians, and those measuring motor function, and lost 5% of patients to follow-up at 1 year [
      • Bracken M.B.
      • Shepard M.J.
      • Collins Jr., W.F.
      • Holford T.R.
      • Baskin D.S.
      • Eisenberg H.M.
      • et al.
      Methylprednisolone or naloxone treatment after acute spinal cord injury: 1-year follow-up data. Results of the second National Acute Spinal Cord Injury Study.
      ]. The flaws in this RCT are sufficiently minor to allow classification as “no serious limitations.”
      The second trial (Pointillart et al. [
      • Pointillart V.
      • Petitjean M.E.
      • Wiart L.
      • Vital J.M.
      • Lassié P.
      • Thicoipé M.
      • et al.
      Pharmacological therapy of spinal cord injury during the acute phase.
      ] in Fig. 1) was unlikely to have concealed allocation, did blind those assessing outcome (but not patients or clinicians), and lost only one of 106 patients to follow-up. Here, quality falls in an intermediate range, and classification as either “no serious limitations” or “serious limitations” may be appropriate. The third trial (Otani et al. [
      • Otani K.
      • Abe H.
      • Kadoya S.
      Beneficial effect of methylprednisolone sodium succinate in the treatment of acute spinal cord injury.
      ] in Fig. 1), which included 158 patients, almost certainly failed to conceal allocation, used no blinding, and lost 26% of patients to follow-up, many more in the steroid group than the control group. This third trial is probably best classified as having “very serious limitations.”
      Considering these three RCTs, should one rate down for design and implementation with respect to the motor function outcome? If we considered only the first two trials, the answer would be no. Therefore, the review authors must decide either to exclude the third trial (thereby only including trials with few limitations) or include it based on a judgment that overall there is a low risk of bias (because most of the evidence comes from trials with few limitations) despite the contribution of the trial with very serious limitations to the overall estimate of effect. This example illustrates that averaging across studies will not be the right approach.

      15. Recording judgments about study limitations

      One great merit of GRADE is its lucid categorization of factors that decrease quality of evidence and the resultant transparency of judgments. This transparency, however, requires careful documentation of judgments. Including a risk of bias table that summarizes key criteria used to assess study limitations for each outcome for each study helps ensure transparency.
      Table 5 presents an example of such a table. Note that the table focuses on only one outcome, symptoms. Each study will need only one line on such a table if, as in this case, there is only one important outcome or if each quality criterion is the same for every important outcome. Each outcome for which quality criteria differ in important ways will need a separate line. Outcomes may, for instance, differ for blinding (e.g., in surgical trials patients completing questionnaires measuring health-related quality of life may be unblinded, but adjudicators of cause-specific mortality may be blinded) or loss to follow-up (e.g., greater loss to follow-up for quality of life than for all-cause mortality).
      Review authors and guideline developers can then summarize their assessments across studies in a “quality assessment” table to fully ensure the transparency of their judgments (Table 4). A footnote provides the reasoning behind the decision to rate down the quality of the evidence from high to low quality on the basis of study limitations (alternatively, one can very briefly summarize the key information in a cell in the table). In this example, there was an additional concern about imprecision, which further decreases the quality of evidence from low to very low. We will describe guidelines for making judgments about imprecision (the risk of random error), in the sixth article in this series.

      References

      1. The users’ guides to the medical literature: a manual for evidence-based clinical practice.
        in: Guyatt G. Rennie D. Meade M. Cook D. 2nd ed. McGraw-Hill, New York, NY2008
        • Higgins J.P.
        • Altman D.
        Assessing the risk of bias in included studies.
        in: Higgins J. Green S. Cochrane handbook for systematic reviews of interventions 5.0.1. John Wiley & Sons, Chichester, UK2008
        • Vedula S.S.
        • Bero L.
        • Scherer R.W.
        • Dickersin K.
        Outcome reporting in industry-sponsored trials of gabapentin for off-label use.
        N Engl J Med. 2009; 361: 1963-1971
        • Mathieu S.
        • Boutron I.
        • Moher D.
        • Altman D.G.
        • Ravaud P.
        Comparison of registered and published primary outcomes in randomized controlled trials.
        JAMA. 2009; 302: 977-984
        • Wood L.
        • Egger M.
        • Gluud L.L.
        • Schulz K.F.
        • Jüni P.
        • Altman D.G.
        • et al.
        Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study.
        BMJ. 2008; 336: 601-605
        • Pocock S.J.
        When (not) to stop a clinical trial for benefit.
        JAMA. 2005; 294: 2228-2230
        • Pocock S.J.
        • Hughes M.D.
        Practical problems in interim analyses, with particular regard to estimation.
        Control Clin Trials. 1989; 10: 209S-221S
        • Montori V.M.
        • Devereaux P.J.
        • Adhikari N.K.
        • Burns K.E.
        • Eggert C.H.
        • Briel M.
        • et al.
        Randomized trials stopped early for benefit: a systematic review.
        JAMA. 2005; 294: 2203-2209
        • Bassler D.
        • Briel M.
        • Montori V.M.
        • Lane M.
        • Glasziou P.
        • Zhou Q.
        • et al.
        Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis.
        JAMA. 2010; 303: 1180-1187
        • Furukawa T.A.
        • Watanabe N.
        • Omori I.M.
        • Montori V.M.
        • Guyatt G.H.
        Association between unreported outcomes and effect size estimates in Cochrane meta-analyses.
        JAMA. 2007; 297: 468-470
        • Chan A.W.
        • Altman D.G.
        Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors.
        BMJ. 2005; 330: 753
        • Chan A.W.
        • Hróbjartsson A.
        • Haahr M.T.
        • Gøtzsche P.C.
        • Altman D.G.
        Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles.
        JAMA. 2004; 291: 2457-2465
        • Chan A.W.
        • Krleza-Jerić K.
        • Schmid I.
        • Altman D.G.
        Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research.
        CMAJ. 2004; 171: 735-740
        • Bolona E.R.
        • Uraga M.V.
        • Haddad R.M.
        • Tracz M.J.
        • Sideras K.
        • Kennedy C.C.
        • et al.
        Testosterone use in men with sexual dysfunction: a systematic review and meta-analysis of randomized placebo-controlled trials.
        Mayo Clin Proc. 2007; 82: 20-28
        • Sinha M.
        • Montori V.M.
        Reporting bias and other biases affecting systematic reviews and meta-analyses: a methodological commentary.
        Expert Rev Pharmacoecon Outcomes Res. 2006; 6: 603-611
        • Deeks J.J.
        • Dinnes J.
        • D’Amico R.
        • Sowden A.J.
        • Sakarovitch C.
        • Song F.
        • et al.
        Evaluating non-randomised intervention studies.
        Health Technol Assess. 2003; 7 (1–173): iii-x
        • West S.
        • King V.
        • Carey T.S.
        • Lohr K.N.
        • Mckoy N.
        • Sutton S.F.
        • et al.
        Systems to rate the strength of scientific evidence [evidence report/technology assessment no 47]. AHRQ Publication No 02–E016.
        Agency for Healthcare Research and Quality, Rockville, MD2002
      2. Proposed Evaluation Tools for COMPUS: assessment, November 29, 2005.
        Canadian Coordinating Office for Health Technology, Ottawa, Canada2005
        • Sanderson S.
        • Tatt I.D.
        • Higgins J.P.
        Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography.
        Int J Epidemiol. 2007; 36: 666-676
        • Greer I.A.
        • Nelson-Piercy C.
        Low-molecular-weight heparins for thromboprophylaxis and treatment of venous thromboembolism in pregnancy: a systematic review of safety and efficacy.
        Blood. 2005; 106: 401-407
        • Sanson B.J.
        • Lensing A.W.
        • Prins M.H.
        • Ginsberg J.S.
        • Barkagan Z.S.
        • Lavenne-Pardonge E.
        • et al.
        Safety of low-molecular-weight heparin in pregnancy: a systematic review.
        Thromb Haemost. 1999; 81: 668-672
        • Bowker S.L.
        • Majumdar S.R.
        • Veugelers P.
        • Johnson J.A.
        Increased cancer-related mortality for patients with type 2 diabetes who use sulfonylureas or insulin.
        Diabetes Care. 2006; 29: 254-258
        • Bracken M.B.
        • Shepard M.J.
        • Collins Jr., W.F.
        • Holford T.R.
        • Baskin D.S.
        • Eisenberg H.M.
        • et al.
        Methylprednisolone or naloxone treatment after acute spinal cord injury: 1-year follow-up data. Results of the second National Acute Spinal Cord Injury Study.
        J Neurosurg. 1992; 76: 23-31
        • Otani K.
        • Abe H.
        • Kadoya S.
        Beneficial effect of methylprednisolone sodium succinate in the treatment of acute spinal cord injury.
        Sekitsui Sekizui. 1994; 7: 633-647
        • Pointillart V.
        • Petitjean M.E.
        • Wiart L.
        • Vital J.M.
        • Lassié P.
        • Thicoipé M.
        • et al.
        Pharmacological therapy of spinal cord injury during the acute phase.
        Spinal Cord. 2000; 38: 71-76
        • Brophy J.M.
        • Joseph L.
        • Rouleau J.L.
        Beta-blockers in congestive heart failure. A Bayesian meta-analysis.
        Ann Intern Med. 2001; 134: 550-560
        • McIntosh H.
        • Jones K.
        Chloroquine or amodiaquine combined with sulfadoxine-pyrimethamine for treating uncomplicated malaria.
        Cochrane Database Syst Rev. 2005; (CD000386)
        • Gibson J.
        • Waddell G.
        Surgical interventions for lumbar disc prolapse.
        Cochrane Database Syst Rev. 2007; (CD001350)https://doi.org/10.1002/14651858
        • Dobre D.
        • van Jaarsveld C.H.
        • deJongste M.J.
        • Haaijer Ruskamp F.M.
        • Ranchor A.V.
        The effect of beta-blocker therapy on quality of life in heart failure patients: a systematic review and meta-analysis.
        Pharmacoepidemiol Drug Saf. 2007; 16: 152-159
        • Moher D.
        • Jadad A.R.
        • Nichol G.
        • Penman M.
        • Tugwell P.
        • Walsh S.
        Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists.
        Control Clin Trials. 1995; 16: 62-73
        • Schulz K.F.
        • Chalmers I.
        • Hayes R.J.
        • Altman D.G.
        Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials.
        JAMA. 1995; 273: 408-412
        • Emerson J.D.
        • Burdick E.
        • Hoaglin D.C.
        • Mosteller F.
        • Chalmers T.C.
        An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials.
        Control Clin Trials. 1990; 11: 339-352
        • Juni P.
        • Witschi A.
        • Bloch R.
        • Egger M.
        The hazards of scoring the quality of clinical trials for meta-analysis.
        JAMA. 1999; 282: 1054-1060
        • Devereaux P.J.
        • Choi P.T.
        • El-Dika S.
        • Bhandari M.
        • Montori V.M.
        • Schünemann H.J.
        • et al.
        An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods.
        J Clin Epidemiol. 2004; 57: 1232-1236
        • Soares H.P.
        • Daniels S.
        • Kumar A.
        • Clarke M.
        • Scott C.
        • Swann S.
        • Djulbegovic B.
        • et al.
        Bad reporting does not mean bad methods for randomised trials: observational study of randomised controlled trials performed by the Radiation Therapy Oncology Group.
        BMJ. 2004; 328: 22-24
        • Alonso-Coello P.
        • Zhou Q.
        • Martinez-Zapata M.J.
        • Mills E.
        • Heels-Ansdell D.
        • Johanson J.F.
        • et al.
        Meta-analysis of flavonoids for the treatment of haemorrhoids.
        Br J Surg. 2006; 93: 909-920
        • Clarke M.
        • Hopewell S.
        • Juszczak E.
        • Eisinga A.
        • Kjeldstrøm M.
        Compression stockings for preventing deep vein thrombosis in airline passengers.
        Cochrane Database Syst Rev. 2007; (CD004002)
        • Geerts W.H.
        • Bergqvist D.
        • Pineo G.F.
        • Heit J.A.
        • Samama C.M.
        • Lassen M.R.
        • et al.
        Prevention of venous thromboembolism: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th Edition).
        Chest. 2008; 133: 381S-453S