Advertisement

GRADE guidelines: 14. Going from evidence to recommendations: the significance and presentation of recommendations

      Abstract

      This article describes the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to classifying the direction and strength of recommendations. The strength of a recommendation, separated into strong and weak, is defined as the extent to which one can be confident that the desirable effects of an intervention outweigh its undesirable effects. Alternative terms for a weak recommendation include conditional, discretionary, or qualified. The strength of a recommendation has specific implications for patients, the public, clinicians, and policy makers. Occasionally, guideline developers may choose to make “only-in-research” recommendations. Although panels may choose not to make recommendations, this choice leaves those looking for answers from guidelines without the guidance they are seeking. GRADE therefore encourages panels to, wherever possible, offer recommendations.

      Keywords

      1. Introduction

      What is new?

        Key points

      • The strength of a recommendation is defined as the extent to which one can be confident that the desirable consequences of an intervention outweigh its undesirable consequences.
      • Grading of Recommendations Assessment, Development, and Evaluation GRADE has chosen a simple four-category classification of recommendations, a binary classification of recommendations as strong or weak (also known as conditional, discretionary, or qualified) recommendations for or against a management approach.
      • The strength of a recommendation has specific implications for patients, the public, clinicians, and policy -makers.
      In prior papers in this series devoted to the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to systematic reviews and practice guidelines, we have dealt with the process before developing recommendations: framing the question [
      • Guyatt G.H.
      • Oxman A.D.
      • Kunz R.
      • Atkins D.
      • Brozek J.
      • Vist G.
      • et al.
      GRADE guidelines: 2. Framing the question and deciding on important outcomes.
      ], choosing critical and important outcomes [
      • Guyatt G.
      • Oxman A.D.
      • Akl E.A.
      • Kunz R.
      • Vist G.
      • Brozek J.
      • et al.
      GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables.
      ], rating the confidence in effect estimates for each outcome [
      • Guyatt G.H.
      • Oxman A.D.
      • Vist G.
      • Kunz R.
      • Brozek J.
      • Alonso-Coello P.
      • et al.
      GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias).
      ,
      • Balshem H.
      • Helfand M.
      • Schunemann H.
      • Oxman A.D.
      • Kunz R.
      • Brozek J.
      • et al.
      Grade guidelines: 3 Rating the quality of evidence—introduction.
      ,
      • Guyatt G.
      • Oxman A.D.
      • Kunz R.
      • Brozek J.
      • Alonso-Coello P.
      • Rind D.
      • et al.
      Grade guidelines: 6. Rating the quality of evidence: imprecision.
      ,
      • Guyatt G.H.
      • Oxman A.D.
      • Montori V.
      • Vist G.
      • Kunz R.
      • Brozek J.
      • et al.
      GRADE guidelines: 5. Rating the quality of evidence—publication bias.
      ,
      • Guyatt G.H.
      • Oxman A.D.
      • Kunz R.
      • Woodcock J.
      • Brozek J.
      • Helfand M.
      • et al.
      GRADE guidelines: 7. Rating the quality of evidence—inconsistency.
      ,
      • Guyatt G.H.
      • Oxman A.D.
      • Kunz R.
      • Woodcock J.
      • Brozek J.
      • Helfand M.
      • et al.
      GRADE guidelines: 8. Rating the quality of evidence—indirectness.
      ,
      • Guyatt G.H.
      • Oxman A.D.
      • Sultan S.
      • Glasziou P.
      • Akl E.A.
      • Alonso-Coello P.
      • et al.
      GRADE guidelines: 9. Rating up the quality of evidence.
      ], rating the confidence in effect estimates across outcomes [
      • Guyatt G.H.
      • Oxman A.D.
      • Sultan S.
      • Brozek J.
      • Glasziou P.
      • Alonso-Coello P.
      • et al.
      GRADE guidelines: 11. Making an overall rating of the quality of evidence for a single outcome and for all outcomes.
      ], dealing with resource use [
      • Brunetti M.
      • Shemilt I.
      • Pregno S.
      • Vale L.
      • Oxman A.D.
      • Lord J.
      • et al.
      GRADE guidelines 11. Special challenges: confidence in estimates for resource use.
      ], creating an evidence profile and a Summary of Findings (SoF) table [
      • Guyatt G.H.
      • Oxman A.D.
      • Santesso N.
      • Helfand M.
      • Vist G.
      • Kunz R.
      • et al.
      GRADE guidelines: 12. Preparing summary of findings tables: binary outcomes.
      ,
      • Guyatt G.H.
      • Thorlund K.
      • Oxman A.D.
      • Walter S.
      • Patrick D.
      • Furukawa T.A.
      • et al.
      GRADE guidelines: 13. Preparing summary of findings tables: continuous outcomes.
      ], and GRADE's approach to diagnostic test recommendations. This article addresses GRADE's approach to categorizing, labeling, and wording health care recommendations. As we did in the initial article in this series, we will define strong or weak recommendations for or against a particular management approach, and discuss the interpretation and presentation of these recommendations. In the next article in the series, we will focus on the process of going from the evidence to the recommendations. Throughout this article, we will refer to guideline developers as “the panel.”

      2. Presenting direction and strength of recommendations

      2.1 Direction of recommendations

      Panels make recommendations either for (when the desirable consequences outweigh the undesirable consequences) or against (when the opposite is true) a particular strategy, in relation to a comparator. With the GRADE approach, the desirable and undesirable consequences are the outcomes classified as “critical” and “important but not critical.” These outcomes are selected at the outset, confirmed when the results are reviewed, and presented in the evidence profile and SoF table.
      In almost all situations, there are trade-offs between management strategies that have some desirable and some undesirable outcomes. Table 1 presents typical categories of desirable and undesirable consequences of a management strategy. Inevitably, evaluating the balance between desirable and undesirable consequences involves judging the relative importance of those consequences, an issue we will address in the next article.
      Table 1Categories of typical desirable and undesirable outcomes of an experimental vs. a control intervention
      Desirable outcomesUndesirable outcomes
      • Increase longevity
      • Reduction in morbid events intervention designed to prevent
      • Resolution of symptoms
      • Improved quality of life
      • Decreased resource use
      • Decreased longevity
      • Immediate serious complications (typically for surgical therapies)
      • Short-term relatively minor side effects
      • Long-term rare serious adverse events
      • Impaired quality of life
      • Inconvenience/hassle
      • Increased resource use

      2.2 Strength of recommendations

      Like confidence in effect estimates (quality of evidence), the strength of a recommendation can be conceptualized as an underlying continuum (Fig. 1). Nevertheless, GRADE has chosen a simple four-category classification of recommendations. If the panel is highly confident of the balance between desirable and undesirable consequences, they make a strong recommendation for (desirable outweighs undesirable) or against (undesirable outweighs desirable) an intervention. If the panel is less confident of the balance between desirable and undesirable consequences, they offer a weak recommendation. Some panels have been concerned about the use of “weak” to characterize recommendations because a weak recommendation can be confused with weak evidence, because guideline users may feel they can ignore weak recommendations, or because users may interpret weak as denoting that the panel was uncertain regarding the right recommendation. GRADE therefore offers alternative labels: conditional, discretionary, and qualified (Box 1) [
      • Chong L.
      • Nasser M.
      • Glasziou P.
      What should we call weak recommendations?.
      ]. As we will demonstrate, the four-category approach to grading recommendations has the merit not only of simplicity, but also of direct links to action on the part of health care providers, health care recipients, and policy makers.
      Figure thumbnail gr1
      Fig. 1Strength of recommendation: a continuum divided into categories.
      Terminology: weak recommendations
      We have referred to recommendations as strong and weak. However, some guideline panels experience an unintended negative connotation with the word “weak,” and possible unintended conflation with “weak evidence.” We suggest three alternative terms that panels may choose to use: conditional, discretionary, or qualified. Recommendations may be conditional upon patient values and preferences, the resources available or the setting in which the intervention will be implemented. Recommendations may be at the discretion of the patient and clinician, or qualified with an explanation about the issues hat would lead decisions to vary.

      2.3 Presentation of recommendations

      Recommendations in the passive voice may lack clarity. We therefore suggest that guideline developers present recommendations in the active voice. For example, a number of organizations use “we recommend…” and “we suggest…” for strong and weak recommendations, respectively. Alternatives for a strong recommendation are “Clinicians should…” or “Clinicians should not…” or “Do…” or “Don't….” Alternatives for a weak recommendation include “Clinicians might…” or “We conditionally recommend…” or “We make a qualified recommendation that…” (Box 1).
      There is, however, limited systematically collected evidence addressing the wording of the strength of recommendations. In a randomized trial, we compared three wording approaches that expressed two grades of recommendation (“we recommend”/“we suggest”; “clinicians should”/“clinicians might”; “we recommend”/“we conditionally recommend”) [
      • Akl E.
      • Guyatt G.H.
      • Levine M.
      • Feldstein D.
      • Irani J.
      • Shaneyfelt T.
      • et al.
      “Might” or “suggest”? No wording approach was clearly superior in conveying the strength of recommendation.
      ]. None of the approaches was clearly superior to the others in conveying the strength of recommendations. Lomotan et al. [
      • Lomotan E.A.
      • Michel G.
      • Lin Z.
      • Shiffman R.N.
      How “should” we write guideline recommendations? Interpretation of deontic terminology in clinical practice guidelines: survey of the health services community.
      ] compared the “level of obligation” assigned to various terms commonly used in health care guidelines. They found that participants assigned different levels of obligation to “must,” “should,” and “may.”
      Recommendations should always specify the population, and unless it is obvious, the comparator. Consider for instance, the following: In patients with acute renal failure, we recommend hourly urine volume measurement for at least 24 hours. The strength of this recommendation may differ depending on whether the alternative is every 2 hours or once a day. Thus, the additional specification “when compared with daily urine volume measurement” is required.
      Sometimes, the recommendation statement will include reference to the setting, particularly when our confidence in estimates of effect would vary according to the setting. For instance, a recommendation regarding carotid endarterectomy might vary depending on the extent of delay between a patient's presentation with symptoms suggesting carotid stenosis and the performance of surgery [
      • Rothwell P.M.
      External validity of randomised controlled trials: “to whom do the results of this trial apply?”.
      ]. Another instance when setting may be important is an expensive intervention in high- vs. low-income countries.
      In general, it is preferable to present recommendations in favor of a particular management approach rather than against an approach. For instance, in considering the addition of aspirin to clopidogrel in patients who have had a stroke, it would be preferable to state: “In patients who have had a stroke, we suggest clopidogrel alone vs. adding aspirin to clopidogrel” rather than “In patients who have had a stroke and are using clopidogrel, we suggest not adding aspirin.”
      Nevertheless, when a useless or harmful therapy is in wide use, recommendations against a management approach are appropriate. For instance, “In patients undergoing cardiac surgery who were not previously receiving beta blockers, we suggest not initiating perioperative beta blocker therapy.”
      Unfortunately, misinterpretation is possible however strength of recommendations is expressed. We suggest guideline developers consider using both symbols (which may be less confusing than numbers or letters [
      • Akl E.A.
      • Maroun N.
      • Guyatt G.
      • Oxman A.D.
      • Alonso-Coello P.
      • Vist G.E.
      • et al.
      Symbols were superior to numbers for presenting strength of recommendations to health care consumers: a randomized trial.
      ]) and words to express strength of recommendations. We suggest ↑↑ as a symbol for strong recommendations and ↑? for weak recommendations. For guideline developers preferring numbers or letters, we suggest “1” for strong recommendations and “2” for weak. For those who prefer a pictorial representation, balancing scales are depicted in (Fig. 2). Whatever terms guideline developers elect to use (e.g., weak, conditional, discretionary, or qualified), we suggest that they use these consistently across different guidelines. Explanations of the meaning and implications of strong and weak recommendations should be readily accessible, for example, using hyperlinks in electronic publications, to facilitate correct interpretation.
      Figure thumbnail gr2
      Fig. 2Balance scales to depict strong vs. weak recommendations.

      3. Meaning of recommendations in GRADE

      3.1 What GRADE means by strong and weak recommendations—for clinicians and patients

      Using the GRADE approach, guideline authors make a strong recommendation when they believe that all or almost all informed people would make the recommended choice for or against an intervention. Consider, for example, the recommendation to take supplemental folate before and during the pregnancy. High-quality evidence suggests folate prevents neural tube defects, a catastrophic outcome of pregnancy [
      Folic acid for the prevention of neural tube defects: U.S. Preventive Services Task Force recommendation statement.
      ,
      • Wolff T.
      • Witkop C.T.
      • Miller T.
      • Syed S.B.
      Folic acid supplementation for the prevention of neural tube defects: an update of the evidence for the U.S. Preventive Services Task Force.
      ]. Folate is inexpensive and has no proven adverse effects. Because the desirable consequences so greatly outweigh the negative, the deduction that all informed women would choose to take supplemental folate is secure, thus warranting a strong recommendation.
      In contrast, guideline panels using GRADE make a weak recommendation when they believe that most informed people would choose the recommended course of action, but a substantial number would not. Consider the recommendation in favor of adjuvant chemotherapy for women with early stage breast cancer. Most women would choose the recommended course of action, but an appreciable number would choose not to take chemotherapy, because they feel that the small possible benefits in survival do not justify the suffering resulting from the serious side effects of chemotherapy [
      • Whelan T.
      • Sawka C.
      • Levine M.
      • Gafni A.
      • Reyno L.
      • Willan A.
      • et al.
      Helping patients make informed choices: a randomized trial of a decision aid for adjuvant chemotherapy in lymph node-negative breast cancer.
      ].
      Given that a strong recommendation implies uniformity of choice and a weak recommendation implies variability, strong and weak recommendations have direct implications for the patient–provider dyad at the point of decision making. Although recognizing that it is always valuable for providers to discuss decisions with patients, allocation of time will differ given the strength of a recommendation. When a recommendation is weak, clinicians and other health care providers need to devote more time to the process of shared decision making by which they ensure that the informed choice reflects individual values and preferences (Box 1). This is likely to involve ensuring patients understand the implications of the choices they are making, possibly using a formal decision aid. When recommendations are strong, clinicians may spend less time on the process of making a decision, and focus efforts on overcoming barriers to implementation or adherence.

      3.2 What GRADE means by strong and weak recommendations—for policy makers

      The implication of a strong recommendation for policy makers is that the recommendation can be adopted as a policy in most situations. A strong recommendation implies that variability in clinical practice between individuals or regions would likely be inappropriate. Thus, for governments, institutions, provider groups, or third-party payers responsible for ensuring high-quality care, strong recommendations also constitute candidates for performance measures (quality of care criteria). For policy makers, the implication of a weak recommendation is that policy making will require substantial debate and involvement of many stakeholders. A weak recommendation implies that variability between individuals or regions may be appropriate, and use as a quality of care criterion is inappropriate unless the criterion is whether patients were properly informed and helped to make a decision consistent with their own values (such as by the use of a decision aid).

      3.3 Strong does not necessarily mean a priority recommendation

      The strength of a recommendation may not be directly correlated with its priority for implementation. The importance or prioritization of a recommendation may differ, depending on the target audience for the recommendation: patients, the public, clinicians, or policy makers. Governments and public health officials considering a public health intervention must consider several factors beyond the strength of a recommendation. These factors—of lesser relevance to recommendations directed at clinicians—include the prevalence of the health problem (higher priority for more common conditions), ease of implementation (higher priority for interventions that can be implemented now), considerations of equity (higher priority for interventions that contribute to reducing address health inequities), total costs to society (lower priority for interventions with high total costs), and the potential for improvement in quality of care (lower priority for recommendations with current high adherence). Therefore, government and public health officials may place a lower priority on implementing strong recommendations although they are important for individual patients. For instance, a National Institute for Clinical Excellence (NICE) guideline concerning hip fractures did not consider implementation of a recommendation to use an intramedullary nail in patients with subtrochanteric fracture a high priority because the practice is already widespread [

      National Institute for Health and Clinical Excellence. Hip fracture: the management of hip fracture in adults. Clinical guideline 124. London, UK; National Institute for Health and Clinical Excellence; June 2011.

      ].
      If guideline panels are addressing funders or health system managers, they should make transparent the manner in which factors related to prevalence, equity, cost, and improving quality of care influence their priorities. Sometimes these same factors can influence recommendations, particularly when guideline panels are making recommendations for clinicians and patients on behalf of funders. When this is the case, they should be explicit about the additional factors that are considered, this should be done consistently, and it should be transparent when these other factors influenced a recommendation.

      4. Transparent values and preferences

      In this section, we deal with the explicit and transparent presentation of the values and preferences underlying recommendations (Box 2). In the next article in the series, we deal with the sources of the values and preferences and how to use them in the process of making recommendations.
      Terminology: values and preferences
      Values and preferences is an overarching term that includes patients' perspectives, beliefs, expectations, and goals for health and life [
      • Montori V.
      • Devereaux P.
      • Straus S.
      • Haynes B.
      • Guyatt G.
      Decision making and the patient.
      ]. More precisely, they refer to the processes that individuals use in considering the potential benefits, harms, costs, limitations, and inconvenience of the management options in relation to one another. For some, the term “values” has the closest connotation to these processes. For others, the connotation of “preferences” best captures the notion of choice. Thus, we use both words together to convey the concept.
      Ideally, guidelines will state foundational assumptions about the values and preferences that underlie their recommendations for the target population. For instance, a guideline addressing issues of thrombosis prevention and treatment in pregnancy noted: “Our recommendations reflect a belief that most women will place a low value on avoiding the pain, cost, and inconvenience of heparin therapy to avoid the small risk of even a minor abnormality in their child” associated with warfarin prophylaxis [
      • Bates S.M.
      • Greer I.A.
      • Pabinger I.
      • Sofaer S.
      • Hirsh J.
      Venous thromboembolism, thrombophilia, antithrombotic therapy, and pregnancy: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th edition).
      ].
      In addition to, or in place of, making such general statements, panels may find it appropriate to make statements associated with specific recommendations that are particularly sensitive to values and preferences. For instance, two panels that were part of a broader guideline effort made apparently contradictory recommendations regarding aspirin vs. clopidogrel in patients with atherosclerotic vascular disease, despite using the same underlying evidence from a trial that enrolled both patients with threatened stroke and those with peripheral vascular disease [
      A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events (CAPRIE). CAPRIE Steering Committee.
      ]. The stroke panel that recommended clopidogrel over aspirin stated: “This recommendation… places a relatively high value on a small absolute risk reduction in stroke rates, and a relatively low value on minimizing drug expenditures [
      • Albers G.W.
      • Amarenco P.
      • Easton J.D.
      • Sacco R.L.
      • Teal P.
      Antithrombotic and thrombolytic therapy for ischemic stroke: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy.
      ].” The peripheral vascular disease panel that recommended aspirin over clopidogrel, stated: “This recommendation places a relatively high value on avoiding large expenditures to achieve small reductions in vascular events” [
      • Clagett G.P.
      • Sobel M.
      • Jackson M.R.
      • Lip G.Y.
      • Tangelder M.
      • Verhaeghe R.
      Antithrombotic therapy in peripheral arterial occlusive disease: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy.
      ]. The recommendations suggest opposite courses of action. Both are appropriate given the stated values and preferences, which were made explicit in qualifying statements accompanying each recommendation. These conflicting recommendations illustrate the importance of the values and preferences underlying the recommendations, the source of which we will discuss in the next article.
      Another way to frame values and preferences statements that panels may want to consider is in terms of patients who do not share the values and preferences underlying the recommendation. UpToDate uses this approach. For instance, in their topic dealing with the treatment of achalasia they say: “For most healthy patients undergoing an invasive procedure, we suggest minimally invasive surgical myotomy rather than pneumatic dilatation. Patients who prefer to avoid surgery and the high rates of gastroesophageal reflux disease seen after surgery, and who are willing to accept a higher initial failure rate and long-term recurrence rate, can reasonably choose pneumatic dilatation” [

      Spechler SJ, Achalaisa. In: UpToDate, Grover S, deputy editor, Basow DS, Editor. Waltham, MA; UpToDate; April 25, 2012.

      ].
      The text describing the rationale for the recommendations should state which outcomes the panel judged critical, which important, and which were not included. For recommendations particularly dependent on values and preferences, and those for which values and preferences are less certain, authors should place statements about underlying values and preferences with the recommendation statement rather than in the accompanying text.
      For instance, a guideline panel made a recommendation for thrombolytic therapy in the context of acute stroke [
      • Albers G.W.
      • Amarenco P.
      • Easton J.D.
      • Sacco R.L.
      • Teal P.
      Antithrombotic and thrombolytic therapy for ischemic stroke: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th edition).
      ]. Thrombolytic therapy improves long-term functional outcome at the cost of an increase in immediate bleeding that is sometimes fatal. Thus, the panel felt compelled to add the following statement immediately following the recommendation: “This recommendation places relatively more weight on overall prospects for long-term functional improvement despite the increased risk of symptomatic intracerebral hemorrhage in the immediate peristroke period.” This prominent positioning of the statements will make it less likely that consumers of the guidelines miss the importance of the values and preference judgments.

      5. Special recommendation in GRADE

      5.1 Recommendations to use interventions only in research may be appropriate

      Panels may face decisions about promising interventions associated with appreciable harms or costs and with insufficient evidence of benefit to support their use. They may be reluctant, on one hand, to recommend against such interventions out of fear that they will stifle further investigation. At the same time, they may worry about encouraging the rapid diffusion of potentially ineffective or harmful interventions, and preventing recruitment to research already under way, by providing premature favorable recommendations for their use.
      The adverse consequences of recommendations to use diethylstilbestrol for the prevention of miscarriage [
      • Apfel R.J.
      • Fisher S.M.
      To do no harm: DES and the dilemmas of modern medicine.
      ,
      • Dutton D.B.
      Worse than the disease: pitfalls of medical progress.
      ] highlight the risk of premature favorable recommendations (risks in the children of clear cell adenocarcinoma of the vagina and cervix, breast cancer, reproductive tract anomalies, infertility, and undescended testicles). When interventions have a large component of fixed costs such as equipment or facilities, an additional problem with premature recommendations in favor of an intervention is the risk of irretrievable allocation of resources that would be better spent elsewhere. Consider, for instance, the impact of prior recommendations to use continuous electronic fetal heart rate monitoring during labor in low-risk pregnancy [
      • Alfirevicm Z.
      • Devane D.
      • Gyte G.
      Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour.
      ,
      • Sibanda J.
      • Beard R.W.
      Influence on clinical practice of routine intra-partum fetal monitoring.
      ].
      Recommendations for use of an intervention only in the context of research may ameliorate these problems. Such a recommendation may provide an important stimulus to efforts to answer important research questions, thus resolving uncertainty about optimal patient management [
      • Liston R.
      • Sawchuck D.
      • Young D.
      Society of Obstetrics and Gynaecologists of CanadaBritish Columbia Perinatal Health Program
      Fetal Health Surveillance: antepartum and intrapartum consensus guideline.
      ]. For instance, a NICE guideline addressing management of patients with hip fracture noted the lack of a clear management pathway for patients admitted from care homes, the lack of randomized trials, and identified this as a research priority [

      National Institute for Health and Clinical Excellence. Hip fracture: the management of hip fracture in adults. Clinical guideline 124. London, UK; National Institute for Health and Clinical Excellence; June 2011.

      ].
      Only-in-research recommendations will be appropriate when three conditions are met: there is insufficient evidence supporting an intervention for a panel to recommend its use; further research has a large potential for reducing uncertainty about the effects of the intervention; and further research is deemed good value for the anticipated costs.
      The research recommendations should be detailed regarding the specific research questions that investigators should address, particularly which patient-important outcomes they should measure [
      • Brown P.
      • Brunnhuber K.
      • Chalkidou K.
      • Chalmers I.
      • Clarke M.
      • Fenton M.
      • et al.
      How to formulate research recommendations.
      ]. The recommendation for research may be accompanied by an explicit strong recommendation not to use the experimental intervention outside of the research context.

      5.2 Guideline panels may choose to not make recommendations

      Not infrequently, panels may find themselves reluctant to make a recommendation for or against a particular management strategy, and also conclude that an “only-in-research” recommendation is inappropriate. There are two very different reasons for reluctance to make recommendations. One is that the confidence in effect estimates is so low that the panels feel a recommendation is too speculative. The US Preventative Services Task Force (USPSTF) has provided a thoughtful discussion of this situation, and some compelling examples (e.g., visual inspection to screen for skin cancer) [
      • Petitti D.B.
      • Teutsch S.M.
      • Barton M.B.
      • Sawaya G.F.
      • Ockene J.K.
      • DeWitt T.
      Update on the methods of the U.S. Preventive Services Task Force: insufficient evidence.
      ].
      The second reason is that although our confidence in effect estimates is moderate or even high, the trade-offs are so closely balanced, and the values and preferences and resource implications not known or too variable, that the panel has great difficulty deciding on the direction of a recommendation.
      The USPSTF has remarked that clinicians “indicate frustration with the lack of guidance” when the task force fails to make recommendations [
      • Petitti D.B.
      • Teutsch S.M.
      • Barton M.B.
      • Sawaya G.F.
      • Ockene J.K.
      • DeWitt T.
      Update on the methods of the U.S. Preventive Services Task Force: insufficient evidence.
      ]. As the USPSTF states: “Decision makers do not have the luxury of waiting for certain evidence. Even though evidence is insufficient, the clinician must still provide advice, patients must make choices, and policy makers must establish policies [
      • Petitti D.B.
      • Teutsch S.M.
      • Barton M.B.
      • Sawaya G.F.
      • Ockene J.K.
      • DeWitt T.
      Update on the methods of the U.S. Preventive Services Task Force: insufficient evidence.
      ].”
      Clinicians will rarely explore the evidence as thoroughly as a guideline panel, nor devote as much thought to the trade-offs, or the possible underlying values and preferences in the population. We therefore encourage panels to deal with their discomfort and to make recommendations even when confidence in effect estimate is low and/or desirable and undesirable consequences are closely balanced. Such recommendations will inevitably be weak, and may be accompanied by qualifications.
      In the unusual circumstances in which panels choose not to make recommendations, they should specify whether this is on the basis of very low confidence in effect estimates, or because they feel the balance between desirable and undesirable consequences is so close they cannot make a recommendation.
      A third reason a panel may be reluctant to make a recommendation is that two management options have very different undesirable consequences, and individual patients' reactions to these consequences are likely to be so different that it makes little sense to think about typical values and preferences. Consider, for instance, adult patients with thalassemia major considering hematopoietic cell transplantation vs. continued medical treatment with transfusion and iron chelation. Such patients may face, on one hand, a possibility of cure of their thalassemia with transplant but an early mortality risk of approximately 33%, and on the other the prospect of continued morbidity and an uncertain prognosis. A guideline panel may consider that in such situations, the only sensible recommendation is a discussion between patient and physician to ascertain the patient's preferences. Guideline panels should not, however, fail to make a recommendation simply because individual patients will make differing choices: that patients will make differing choices is a defining feature of a weak recommendation.

      6. Conclusion

      Guideline developers have used widely varying presentations of recommendations, and generally fail to specify the implications of recommendations for patients, clinicians, and policy makers. For instance, Hussain et al. [
      • Hussain T.
      • Michel G.
      • Shiffman R.N.
      The Yale Guideline Recommendation Corpus: a representative sample of the knowledge content of guidelines.
      ] observed important variation in formulations of recommendations within and across guidelines. GRADE's approach to standardized terminology and presentation, and clear specification of the implications of strong and weak recommendations, addresses these shortcomings.

      References

        • Guyatt G.H.
        • Oxman A.D.
        • Kunz R.
        • Atkins D.
        • Brozek J.
        • Vist G.
        • et al.
        GRADE guidelines: 2. Framing the question and deciding on important outcomes.
        J Clin Epidemiol. 2011; 64: 395-400
        • Guyatt G.
        • Oxman A.D.
        • Akl E.A.
        • Kunz R.
        • Vist G.
        • Brozek J.
        • et al.
        GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables.
        J Clin Epidemiol. 2011; 64: 383-394
        • Guyatt G.H.
        • Oxman A.D.
        • Vist G.
        • Kunz R.
        • Brozek J.
        • Alonso-Coello P.
        • et al.
        GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias).
        J Clin Epidemiol. 2011; 64: 407-415
        • Balshem H.
        • Helfand M.
        • Schunemann H.
        • Oxman A.D.
        • Kunz R.
        • Brozek J.
        • et al.
        Grade guidelines: 3 Rating the quality of evidence—introduction.
        J Clin Epidemiol. 2011; 64: 401-406
        • Guyatt G.
        • Oxman A.D.
        • Kunz R.
        • Brozek J.
        • Alonso-Coello P.
        • Rind D.
        • et al.
        Grade guidelines: 6. Rating the quality of evidence: imprecision.
        J Clin Epidemiol. 2011; 64: 1283-1293
        • Guyatt G.H.
        • Oxman A.D.
        • Montori V.
        • Vist G.
        • Kunz R.
        • Brozek J.
        • et al.
        GRADE guidelines: 5. Rating the quality of evidence—publication bias.
        J Clin Epidemiol. 2011; 64: 1277-1282
        • Guyatt G.H.
        • Oxman A.D.
        • Kunz R.
        • Woodcock J.
        • Brozek J.
        • Helfand M.
        • et al.
        GRADE guidelines: 7. Rating the quality of evidence—inconsistency.
        J Clin Epidemiol. 2011; 64: 1294-1302
        • Guyatt G.H.
        • Oxman A.D.
        • Kunz R.
        • Woodcock J.
        • Brozek J.
        • Helfand M.
        • et al.
        GRADE guidelines: 8. Rating the quality of evidence—indirectness.
        J Clin Epidemiol. 2011; 64: 1303-1310
        • Guyatt G.H.
        • Oxman A.D.
        • Sultan S.
        • Glasziou P.
        • Akl E.A.
        • Alonso-Coello P.
        • et al.
        GRADE guidelines: 9. Rating up the quality of evidence.
        J Clin Epidemiol. 2011; 64: 1311-1316
        • Guyatt G.H.
        • Oxman A.D.
        • Sultan S.
        • Brozek J.
        • Glasziou P.
        • Alonso-Coello P.
        • et al.
        GRADE guidelines: 11. Making an overall rating of the quality of evidence for a single outcome and for all outcomes.
        J Clin Epidemiol. 2013; 66: 151-157
        • Brunetti M.
        • Shemilt I.
        • Pregno S.
        • Vale L.
        • Oxman A.D.
        • Lord J.
        • et al.
        GRADE guidelines 11. Special challenges: confidence in estimates for resource use.
        J Clin Epidemiol. 2013; 66: 140-150
        • Guyatt G.H.
        • Oxman A.D.
        • Santesso N.
        • Helfand M.
        • Vist G.
        • Kunz R.
        • et al.
        GRADE guidelines: 12. Preparing summary of findings tables: binary outcomes.
        J Clin Epidemiol. 2013; 66: 158-172
        • Guyatt G.H.
        • Thorlund K.
        • Oxman A.D.
        • Walter S.
        • Patrick D.
        • Furukawa T.A.
        • et al.
        GRADE guidelines: 13. Preparing summary of findings tables: continuous outcomes.
        J Clin Epidemiol. 2013; 66: 173-183
        • Chong L.
        • Nasser M.
        • Glasziou P.
        What should we call weak recommendations?.
        Newsl Int Soc Evid Based Health Care. 2011; 2: 6
        • Akl E.
        • Guyatt G.H.
        • Levine M.
        • Feldstein D.
        • Irani J.
        • Shaneyfelt T.
        • et al.
        “Might” or “suggest”? No wording approach was clearly superior in conveying the strength of recommendation.
        J Clin Epidemiol. 2012; 65: 268-275
        • Lomotan E.A.
        • Michel G.
        • Lin Z.
        • Shiffman R.N.
        How “should” we write guideline recommendations? Interpretation of deontic terminology in clinical practice guidelines: survey of the health services community.
        Qual Saf Health Care. 2010; 19: 509-513
        • Rothwell P.M.
        External validity of randomised controlled trials: “to whom do the results of this trial apply?”.
        Lancet. 2005; 365: 82-93
        • Akl E.A.
        • Maroun N.
        • Guyatt G.
        • Oxman A.D.
        • Alonso-Coello P.
        • Vist G.E.
        • et al.
        Symbols were superior to numbers for presenting strength of recommendations to health care consumers: a randomized trial.
        J Clin Epidemiol. 2007; 60: 1298-1305
      1. Folic acid for the prevention of neural tube defects: U.S. Preventive Services Task Force recommendation statement.
        Ann Intern Med. 2009; 150: 626-631
        • Wolff T.
        • Witkop C.T.
        • Miller T.
        • Syed S.B.
        Folic acid supplementation for the prevention of neural tube defects: an update of the evidence for the U.S. Preventive Services Task Force.
        Ann Intern Med. 2009; 150: 632-639
        • Whelan T.
        • Sawka C.
        • Levine M.
        • Gafni A.
        • Reyno L.
        • Willan A.
        • et al.
        Helping patients make informed choices: a randomized trial of a decision aid for adjuvant chemotherapy in lymph node-negative breast cancer.
        J Natl Cancer Inst. 2003; 95: 581-587
      2. National Institute for Health and Clinical Excellence. Hip fracture: the management of hip fracture in adults. Clinical guideline 124. London, UK; National Institute for Health and Clinical Excellence; June 2011.

        • Bates S.M.
        • Greer I.A.
        • Pabinger I.
        • Sofaer S.
        • Hirsh J.
        Venous thromboembolism, thrombophilia, antithrombotic therapy, and pregnancy: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th edition).
        Chest. 2008; 133: 844S-886S
      3. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events (CAPRIE). CAPRIE Steering Committee.
        Lancet. 1996; 348: 1329-1339
        • Albers G.W.
        • Amarenco P.
        • Easton J.D.
        • Sacco R.L.
        • Teal P.
        Antithrombotic and thrombolytic therapy for ischemic stroke: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy.
        Chest. 2004; 126: 483S-512S
        • Clagett G.P.
        • Sobel M.
        • Jackson M.R.
        • Lip G.Y.
        • Tangelder M.
        • Verhaeghe R.
        Antithrombotic therapy in peripheral arterial occlusive disease: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy.
        Chest. 2004; 126: 609S-626S
      4. Spechler SJ, Achalaisa. In: UpToDate, Grover S, deputy editor, Basow DS, Editor. Waltham, MA; UpToDate; April 25, 2012.

        • Albers G.W.
        • Amarenco P.
        • Easton J.D.
        • Sacco R.L.
        • Teal P.
        Antithrombotic and thrombolytic therapy for ischemic stroke: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th edition).
        Chest. 2008; 133: 630S-669S
        • Apfel R.J.
        • Fisher S.M.
        To do no harm: DES and the dilemmas of modern medicine.
        Yale University Press, New Haven, CT1984
        • Dutton D.B.
        Worse than the disease: pitfalls of medical progress.
        Cambridge University Press, Cambridge, UK1988
        • Alfirevicm Z.
        • Devane D.
        • Gyte G.
        Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour.
        Cochrane Database Syst Rev. 2006; 3: CD006066
        • Sibanda J.
        • Beard R.W.
        Influence on clinical practice of routine intra-partum fetal monitoring.
        Br Med J. 1975; 3: 341-343
        • Liston R.
        • Sawchuck D.
        • Young D.
        • Society of Obstetrics and Gynaecologists of Canada
        • British Columbia Perinatal Health Program
        Fetal Health Surveillance: antepartum and intrapartum consensus guideline.
        J Obstet Gynaecol Can. 2007; 29 (Erratum in: J Obstet Gynaecol Can 2007;29(11):909): S3-56
        • Brown P.
        • Brunnhuber K.
        • Chalkidou K.
        • Chalmers I.
        • Clarke M.
        • Fenton M.
        • et al.
        How to formulate research recommendations.
        Br Med J. 2006; 333: 804-806
        • Petitti D.B.
        • Teutsch S.M.
        • Barton M.B.
        • Sawaya G.F.
        • Ockene J.K.
        • DeWitt T.
        Update on the methods of the U.S. Preventive Services Task Force: insufficient evidence.
        Ann Intern Med. 2009; 150: 199-205
        • Hussain T.
        • Michel G.
        • Shiffman R.N.
        The Yale Guideline Recommendation Corpus: a representative sample of the knowledge content of guidelines.
        Int J Med Inform. 2009; 78: 354-363
        • Montori V.
        • Devereaux P.
        • Straus S.
        • Haynes B.
        • Guyatt G.
        Decision making and the patient.
        in: Guyatt G. Rennie D. Meade M. Cook D. The users' guides to the medical literature: a manual for evidence-based clinical practice. 2nd ed. McGraw-Hill, New York, NY2008