Advertisement
Systematic Review| Volume 65, ISSUE 11, P1133-1143, November 2012

Download started.

Ok

The appropriateness method has acceptable reliability and validity for assessing overuse and underuse of surgical procedures

  • Elise H. Lawson
    Correspondence
    Corresponding author. Department of Surgery, UCLA Medical Center, 10833 LeConte Ave., CHS 72-215, Los Angeles, CA 90095, USA.
    Affiliations
    Department of Surgery, David Geffen School of Medicine, University of California, 10833 LeConte Ave., CHS 72-215, Los Angeles, CA 90095, USA

    Division of Research and Optimal Patient Care, American College of Surgeons, 633 N. Saint Clair St., Chicago, IL 60611, USA
    Search for articles by this author
  • Melinda Maggard Gibbons
    Affiliations
    Department of Surgery, David Geffen School of Medicine, University of California, 10833 LeConte Ave., CHS 72-215, Los Angeles, CA 90095, USA

    Department of Surgery, Olive View-UCLA Medical Center, 14445 Olive View Drive, Sylmar, CA 91342, USA
    Search for articles by this author
  • Clifford Y. Ko
    Affiliations
    Department of Surgery, David Geffen School of Medicine, University of California, 10833 LeConte Ave., CHS 72-215, Los Angeles, CA 90095, USA

    Division of Research and Optimal Patient Care, American College of Surgeons, 633 N. Saint Clair St., Chicago, IL 60611, USA

    Departments of Surgery and Medicine, VA Greater Los Angeles Healthcare System, 11301 Wilshire Blvd., Los Angeles, CA 90073, USA
    Search for articles by this author
  • Paul G. Shekelle
    Affiliations
    Departments of Surgery and Medicine, VA Greater Los Angeles Healthcare System, 11301 Wilshire Blvd., Los Angeles, CA 90073, USA

    RAND Corporation, 1776 Main St., Santa Monica, CA 90401, USA
    Search for articles by this author

      Abstract

      Objective

      To summarize the findings of methodological studies on the RAND/University of California Los Angeles (RAND/UCLA) appropriateness method, which was developed to assess if variation in the use of surgical procedures is because of overuse and/or underuse.

      Study Design and Setting

      A MEDLINE literature search was performed. Studies were included if they assessed the reliability or validity of the RAND/UCLA appropriateness method for a surgical procedure or the effect of altering panelist composition or eliminating in-person discussion between rating rounds. Information was abstracted on procedure, study design, and findings.

      Results

      One thousand six hundred one titles were identified, and 37 met the inclusion criteria. The test–retest reliability is good to very good (kappa, 0.64–0.81) for total knee and hip joint replacement, coronary artery bypass grafting (CABG), and carotid endarterectomy (CEA). The interpanel reliability is moderate to very good (kappa, 0.52–0.83) for CABG and hysterectomy. Construct validity has been demonstrated by comparing the appropriateness method with guidelines and/or evidence-based approaches for endoscopy, colonoscopy, CABG, hysterectomy, and CEA. Predictive validity has been studied for cardiac revascularization, in which concordance with appropriateness classification is associated with better clinical outcomes.

      Conclusion

      Our findings support use of the appropriateness method to assess variation in the rates of the procedures studied by identifying overuse and underuse. Further methodological research should be conducted as appropriateness criteria are developed and implemented for a broader range of procedures.

      Keywords

      1. Introduction

      What is new?

        Key findings

      • The RAND/University of California Los Angeles (RAND/UCLA) appropriateness method has moderate to very good reliability for determining overuse and underuse of the surgical procedures studied: total knee and hip joint replacement, coronary artery bypass grafting (CABG), carotid endarterectomy (CEA), and hysterectomy.
      • The construct validity of appropriateness criteria has been demonstrated for upper gastrointestinal endoscopy, colonoscopy, CABG, hysterectomy, and CEA through comparisons with professional society guidelines and/or evidence-based approaches.
      • Concordance with appropriateness criteria classification is associated with better clinical outcomes for cardiac revascularization.

        What this adds to what was known?

      • Systematic review of methodological studies on the RAND/UCLA appropriateness method.

        What is the implication and what should change now?

      • Our study supports the use of the RAND/UCLA appropriateness method to assess variation in the use of surgical procedures by identifying overuse and underuse for the procedures studied and highlights the need for further methodological research as criteria are developed and implemented for a broader range of procedures.
      Prior reports have demonstrated substantial variations in the rates of surgical procedures performed in the United States that are not fully explained by disease incidence or patient preferences [
      • Jha A.K.
      • Fisher E.S.
      • Li Z.
      • Orav E.J.
      • Epstein A.M.
      Racial trends in the use of major procedures among the elderly.
      ,
      • Weinstein J.N.
      • Bronner K.K.
      • Morgan T.S.
      • Wennberg J.E.
      Trends and geographic variations in major surgery for degenerative diseases of the hip, knee, and spine.
      ,
      • Patel M.R.
      • Greiner M.A.
      • DiMartino L.D.
      • Schulman K.A.
      • Duncan P.W.
      • Matchar D.B.
      • et al.
      Geographic variation in carotid revascularization among Medicare beneficiaries, 2003-2006.
      ,
      ]. The multifactorial causes of these variations are not fully known but are likely in part related to differences in clinician opinion regarding treatment options [
      • Chassin M.R.
      Explaining geographic variations. The enthusiasm hypothesis.
      ,
      • Wright J.G.
      • Hawker G.A.
      • Bombardier C.
      • Croxford R.
      • Dittus R.S.
      • Freund D.A.
      • et al.
      Physician enthusiasm as an explanation for area variation in the utilization of knee replacement surgery.
      ]. Although randomized controlled trials are considered the gold standard for determining the safety and effectiveness of a procedure, the generalizability of results to a broad population is often limited by the enrollment of a narrow spectrum of patients, and the time lag between study planning and dissemination of results can be substantial. Furthermore, relying on trials to determine the best treatment for a wide range of clinical scenarios would be impractical. Consider, for example, designing a trial where four clinical variables are judged as important in deciding whether a patient should undergo surgery (e.g., severe disease vs. mild disease, high surgical risk vs. low surgical risk, prior response to medical therapy, and the presence or absence of only a single comorbid condition). These four clinical variables can define 16 potentially distinct patient populations (24=16 clinical scenarios). If the study requires 200 people in each comparison group to have sufficient power to detect a moderate-sized difference, this would require the enrollment of 200×16=3,200 people in the trial. Undertaking and completing randomized controlled trials enrolling 3,200 people for each of the 20 most common surgical procedures would require the infrastructure and funding to enroll, collect data on, and follow-up 64,000 patients. This is not likely to be achieved in the near future, even with the resources now available in the United States through the newly established Patient-Centered Outcomes Research Institute.
      An alternative option is the RAND/University of California Los Angeles (RAND/UCLA) appropriateness method, which supplements evidence from clinical trials with expert opinion to better inform clinicians and patients regarding treatment options. Multidisciplinary panelists weigh the relative risks and benefits of a procedure for an exhaustive list of specific clinical indications and assign ratings based on the evidence in the literature and his/her own clinical judgment. The rated indications, or appropriateness criteria, can then be applied to patient po-pulations to determine if variation is the result of overuse (patient undergoes a procedure for which the risks outweigh the benefits) and/or underuse (patient does not receive a procedure that is proven effective and would improve their health) [
      • Fitch K.
      • Bernstein S.
      • Aguilar M.
      • Burnand B.
      • LaCalle J.R.
      • Lazaro P.
      • et al.
      The RAND/UCLA appropriateness method user's manual 2001. No. MR-1269-DG-XII/RE:126.
      ].
      Since its development more than 25 years ago, a considerable amount of research has been conducted on the reliability and validity of using the RAND/UCLA appropriateness method to assess for the appropriate use of surgical procedures. Much of this research is focused on areas of controversy or concern regarding key elements of the method, such as potential variability in the process because of panelist composition, the role of the chairperson, and the necessity of having two rounds of rating separated by an in-person discussion [
      • Hicks N.R.
      Some observations on attempts to measure appropriateness of care.
      ]. The purpose of our study was to summarize the results of these methodological studies. Our goal was to determine if evidence on the reliability and validity of this method supports the call for a broad and coordinated effort to develop, implement, and maintain appropriateness criteria to address variation in the use of surgical procedures.

      2. Methods

      2.1 RAND/UCLA appropriateness method

      The RAND/UCLA appropriateness method was developed to systematically assess variation in the use of procedures by defining which patients should and should not undergo surgical intervention vs. medical therapy. An appropriate indication for a procedure is one for which “the expected health benefit (e.g., increased life expectancy, relief of pain, reduction in anxiety, and improved functional capacity) exceeds the expected negative consequences (e.g., mortality, morbidity, anxiety, pain, and time lost from work) by a sufficiently wide margin that the procedure is worth doing” [
      • Brook R.H.
      • Chassin M.R.
      • Fink A.
      • Solomon D.H.
      • Kosecoff J.
      • Park R.E.
      A method for the detailed assessment of the appropriateness of medical technologies.
      ]. This method starts with an extensive review of the literature on risks and benefits of the procedure. A comprehensive and mutually exclusive set of clinical scenarios or indications for the procedure is then compiled, complete with specific definitions for any potentially ambiguous terms (e.g., “failed medical therapy” would be explicitly defined). Because of the need to be inclusive, the list typically includes many hundreds of clinical indications.
      Individuals comprising an expert panel rate each indication in two rounds, with the second round occurring after an in-person discussion of the first round results. Indications are classified as “appropriate” (the expected benefits of the procedure outweigh the expected harms), “equivocal” (the expected benefits and harms are roughly equal, or there is disagreement among the panelists), or “inappropriate” (the expected harms outweigh the expected benefits). Appropriate indications are sometimes further classified as “necessary” by the panel, usually in a third round. An indication is considered necessary if it would be improper care to not offer the procedure, there is a reasonable chance the procedure will benefit the patient, and the magnitude of the benefit is not small [
      • Fitch K.
      • Bernstein S.
      • Aguilar M.
      • Burnand B.
      • LaCalle J.R.
      • Lazaro P.
      • et al.
      The RAND/UCLA appropriateness method user's manual 2001. No. MR-1269-DG-XII/RE:126.
      ]. Table 1 lists examples of indications with each of these classifications [
      • Yermilov I.
      • McGory M.L.
      • Shekelle P.W.
      • Ko C.Y.
      • Maggard M.A.
      Appropriateness criteria for bariatric surgery: beyond the NIH guidelines.
      ,
      • Patel M.R.
      • Dehmer G.J.
      • Hirshfeld J.W.
      • Smith P.K.
      • Spertus J.A.
      ACCF/SCAI/STS/AATS/AHA/ASNC 2009 Appropriateness Criteria for Coronary Revascularization: a Report by the American College of Cardiology Foundation Appropriateness Criteria Task Force, Society for Cardiovascular Angiography and Interventions, Society of Thoracic Surgeons, American Association for Thoracic Surgery, American Heart Association, and the American Society of Nuclear Cardiology Endorsed by the American Society of Echocardiography, the Heart Failure Society of America, and the Society of Cardiovascular Computed Tomography.
      ].
      Table 1Examples of indications rated appropriate, equivocal, or inappropriate
      AppropriateEquivocalInappropriate
      Bariatric surgery
      • Yermilov I.
      • McGory M.L.
      • Shekelle P.W.
      • Ko C.Y.
      • Maggard M.A.
      Appropriateness criteria for bariatric surgery: beyond the NIH guidelines.
      Patients 65 y or older with a BMI of ≥40 and diabetes with a HgbA1c 7–9 on maximal medical therapyPatients 65 y or older with a BMI of ≥40 and diabetes with a HgbA1c 7–9 and not on maximal medical therapyPatients 65 y or older with a BMI of ≥40 and no comorbidities
      Patients aged 12–18 y with a BMI of ≥40 and diabetes with a HgbA1c 7–9 on maximal medical therapyPatients aged 12–18 y with a BMI of ≥40 and diabetes with a HgbA1c 7–9 and not on maximal medical therapyPatients aged 12–18 y with a BMI of ≥40 and no comorbidities
      Cardiac revascularization
      • Patel M.R.
      • Dehmer G.J.
      • Hirshfeld J.W.
      • Smith P.K.
      • Spertus J.A.
      ACCF/SCAI/STS/AATS/AHA/ASNC 2009 Appropriateness Criteria for Coronary Revascularization: a Report by the American College of Cardiology Foundation Appropriateness Criteria Task Force, Society for Cardiovascular Angiography and Interventions, Society of Thoracic Surgeons, American Association for Thoracic Surgery, American Heart Association, and the American Society of Nuclear Cardiology Endorsed by the American Society of Echocardiography, the Heart Failure Society of America, and the Society of Cardiovascular Computed Tomography.
      Patients without prior bypass surgery, one- or two-vessel CAD without the involvement of proximal LAD, low-risk findings on noninvasive testing, receiving a course of maximal anti-ischemic medical therapy, and class III or IV anginaPatients without prior bypass surgery, one- or two-vessel CAD without the involvement of proximal LAD, low-risk findings on noninvasive testing, receiving a course of maximal anti-ischemic medical therapy, and class I or II anginaPatients without prior bypass surgery, one- or two-vessel CAD without the involvement of proximal LAD, low-risk findings on noninvasive testing, receiving a course of maximal anti-ischemic medical therapy, and asymptomatic
      Patients with prior bypass surgery, no ACS, one or more lesions in native coronary arteries without grafts, all grafts patent and without significant disease, intermediate-risk findings on noninvasive testing, receiving no or minimal anti-ischemic medical therapy, and class III or IV anginaPatients with prior bypass surgery, no ACS, one or more lesions in native coronary arteries without grafts, all grafts patent and without significant disease, intermediate-risk findings on noninvasive testing, receiving no or minimal anti-ischemic medical therapy, and class I or II anginaPatients with prior bypass surgery, no ACS, one or more lesions in native coronary arteries without grafts, all grafts patent and without significant disease, intermediate-risk findings on noninvasive testing, receiving no or minimal anti-ischemic medical therapy, and asymptomatic
      Abbreviations: BMI, body mass index; CAD, coronary artery disease; LAD, left anterior descending artery; ACS, acute coronary syndrome.
      Overuse and underuse of a surgical procedure are determined by applying the aforementioned indications to actual patients. Underuse is defined as a patient with a necessary indication who does not receive the procedure. The study sample is derived from a pool of patients who may or may not undergo the procedure. For example, patients who are diagnosed with coronary artery disease after undergoing coronary angiography then proceed with either coronary revascularization or medical management. Overuse is defined as any patient who undergoes a procedure for an inappropriate indication. The study sample for overuse (and for appropriate use) is derived from patients who received the procedure.

      2.2 Literature search, study inclusion, and data abstraction

      We searched MEDLINE in February 2010 and May 2011 for articles on the appropriateness of surgical procedures using the following search strategy: MeSH terms “transplantation” OR “surgery” OR “surgical procedures, operative” AND “appropriateness” in the title or abstract. To be included, articles had to be an original research study addressing the reliability and/or validity of the RAND/UCLA appropriateness method for a surgical procedure. We also included articles that assessed the effect of altering panelist composition or eliminating the in-person discussion between rating rounds. Non-English articles were excluded. Procedures that are not commonly performed by surgeons were excluded (i.e., bronchoscopy and percutaneous coronary intervention), as were those related to pregnancy or childbirth (i.e., cesarean section and circumcision). Two physician reviewers (E.H.L. and M.M.G.) screened each study. References were mined to identify additional articles for inclusion. Disagreements were resolved by consensus.

      2.2.1 Reliability of the RAND/UCLA appropriateness method

      Studies on test–retest reliability were included if the same panel rerated indications after allowing a period of at least 6 months time to elapse. This period was chosen based on the evidence that by 1 year, 15% of systematic reviews may be out of date [
      • Shojania K.G.
      • Sampson M.
      • Ansari M.T.
      • Ji J.
      • Doucette S.
      • Moher D.
      How quickly do systematic reviews go out of date? A survival analysis.
      ]. Studies on interpanel reliability (i.e., the reproducibility of the RAND/UCLA appropriateness method panel results between different panels) were included if they had an experimental design. Observational studies comparing the new panel results with prior panel results were not considered to be assessing interpanel reliability because of the potential for other changing variables (panelist nationality or discipline and year study was conducted) to confound the results.

      2.2.2 Construct validity of the RAND/UCLA appropriateness method

      Studies on construct validity were included if they compared results produced by the RAND/UCLA appropriateness method with guideline recommendations from a professional society or with published evidence from trials or observational studies. Additionally, we included studies that compared the results of the appropriateness method with a quantitative method for predicting ratios of risks and benefits, such as decision analytic models.

      2.2.3 Predictive validity of the RAND/UCLA appropriateness method

      Studies on the predictive validity of the RAND/UCLA appropriateness method were included if they assessed the association between treatment concordant with appropriateness criteria classification and clinical outcomes for patients who did and did not go on to receive a procedure. Studies that only report clinical outcomes for patients who underwent a procedure were excluded because of the potential for selection bias in these studies. Additionally, we included a study that assessed the concordance of appropriateness ratings for a procedure produced by a panel with subsequent evidence regarding appropriateness produced by randomized controlled trials because we considered this as a novel assessment of predictive validity.

      2.2.4 Effect of varying panelist discipline/nationality or eliminating the in-person discussion

      Studies examining the effect of varying panelist discipline and/or nationality on interpanel results were included; however, we did not review in detail the large number of studies that only compared intrapanel results by discipline or nationality; rather, these are briefly summarized regarding their findings. Experimental studies comparing the results produced by panels with and without an in-person discussion between rating rounds (i.e., in-person panels vs. panels conducted entirely by mail) were included if they otherwise followed the RAND/UCLA appropriateness method for both panels, including two rounds of rating with feedback of the first round results. As with interpanel reliability, observational studies comparing new panel results with prior panel results were excluded.
      Data abstracted included dates of study and publication, procedure, study design, and findings. A review protocol was developed before initiating the search; however, it was not registered. This study received no external funding. One author (E.H.L.) was funded by the American College of Surgeons through the Robert Wood Johnson Foundation Clinical Scholars Program (RWJF CSP), and the other three authors participated through their roles as advisors to the RWJF CSP.

      3. Results

      3.1 Description of studies identified by the literature search

      Our search identified 1,601 articles, of which 395 were screened and 37 were included in this review (Fig. 1). Three of the included articles were identified through reference mining. Some articles reported on more than one topic. Of the included articles, 5 assessed reliability, 21 assessed validity, and 17 assessed the effect of altering panel composition or eliminating the in-person discussion between rating rounds. Specifically, the test–retest reliability was reported by four articles [
      • Escobar A.
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Guenaga J.I.
      • Arenaza J.C.
      • et al.
      Development of explicit criteria for total knee replacement.
      ,
      • Hemingway H.
      • Crook A.M.
      • Dawson J.R.
      • Edelman J.
      • Edmondson S.
      • Feder G.
      • et al.
      Rating the appropriateness of coronary angiography, coronary angioplasty and coronary artery bypass grafting: the ACRE study. Appropriateness of Coronary Revascularisation study.
      ,
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Goenaga J.I.
      • Elexpe X.
      • Letona J.
      • et al.
      Evaluation of explicit criteria for total hip joint replacement.
      ,
      • Merrick N.J.
      • Fink A.
      • Park R.E.
      • Brook R.H.
      • Kosecoff J.
      • Chassin M.R.
      • et al.
      Derivation of clinical indications for carotid endarterectomy by an expert panel.
      ], interpanel reliability by one article [
      • Shekelle P.G.
      • Kahan J.P.
      • Bernstein S.J.
      • Leape L.L.
      • Kamberg C.J.
      • Park R.E.
      The reproducibility of a method to identify the overuse and underuse of medical procedures.
      ], comparison with guideline recommendations by nine articles [
      • Kaliszan B.
      • Soule J.C.
      • Vallot T.
      • Mignon M.
      Applicability and efficacy of qualifying criteria for an appropriate use of diagnostic upper gastrointestinal endoscopy.
      ,
      • Bersani G.
      • Rossi A.
      • Suzzi A.
      • Ricci G.
      • De Fabritiis G.
      • Alvisi V.
      Comparison between the two systems to evaluate the appropriateness of endoscopy of the upper digestive tract.
      ,
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      ,
      • Leape L.L.
      • Weissman J.S.
      • Schneider E.C.
      • Piana R.N.
      • Gatsonis C.
      • Epstein A.M.
      Adherence to practice guidelines: the role of specialty society guidelines.
      ,
      • Broder M.S.
      • Kanouse D.E.
      • Mittman B.S.
      • Bernstein S.J.
      The appropriateness of recommendations for hysterectomy.
      ,
      • Ziskind A.A.
      • Lauer M.A.
      • Bishop G.
      • Vogel R.A.
      Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
      ,
      • Froehlich F.
      • Pache I.
      • Burnand B.
      • Vader J.P.
      • Fried M.
      • Beglinger C.
      • et al.
      Performance of panel-based criteria to evaluate the appropriateness of colonoscopy: a prospective study.
      ,
      • Herrin J.
      • Etchason J.A.
      • Kahan J.P.
      • Brook R.H.
      • Ballard D.J.
      Effect of panel composition on physician ratings of appropriateness of abdominal aortic aneurysm surgery: elucidating differences between multispecialty panel results and specialty society recommendations.
      ,
      • Kahn K.L.
      • Park R.E.
      • Vennes J.
      • Brook R.H.
      Assigning appropriateness ratings for diagnostic upper gastrointestinal endoscopy using two different approaches.
      ], comparison with published evidence by three articles [
      • Merrick N.J.
      • Fink A.
      • Park R.E.
      • Brook R.H.
      • Kosecoff J.
      • Chassin M.R.
      • et al.
      Derivation of clinical indications for carotid endarterectomy by an expert panel.
      ,
      • Bridevaux I.P.
      • Silaghi A.M.
      • Vader J.P.
      • Froehlich F.
      • Gonvers J.J.
      • Burnand B.
      Appropriateness of colorectal cancer screening: appraisal of evidence by experts.
      ,
      • Nicollier-Fahrni A.
      • Vader J.P.
      • Froehlich F.
      • Gonvers J.J.
      • Burnand B.
      Development of appropriateness criteria for colonoscopy: comparison between a standardized expert panel and an evidence-based medicine approach.
      ], comparison with a form of decision analysis by four articles [
      • Silverstein M.D.
      • Ballard D.J.
      Expert panel assessment of appropriateness of abdominal aortic aneurysm surgery: global judgement versus probability estimation.
      ,
      • Bernstein S.J.
      • Hofer T.P.
      • Meijler A.P.
      • Rigter H.
      Setting standards for effectiveness: a comparison of expert panels and decision analysis.
      ,
      • Oddone E.Z.
      • Samsa G.
      • Matchar D.B.
      Global judgments versus decision-model-facilitated judgments: are experts internally consistent?.
      ,
      • McClellan M.
      • Brook R.H.
      Appropriateness of care. A comparison of global and outcome methods to set standards.
      ], and predictive validity by four articles [
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      ,
      • Ziskind A.A.
      • Lauer M.A.
      • Bishop G.
      • Vogel R.A.
      Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
      ,
      • Hemingway H.
      • Crook A.M.
      • Feder G.
      • Banerjee S.
      • Dawson J.R.
      • Magee P.
      • et al.
      Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization.
      ,
      • Kravitz R.L.
      • Laouri M.
      • Kahan J.P.
      • Guzy P.
      • Sherman T.
      • Hilborne L.
      • et al.
      Validity of criteria used for detecting underuse of coronary revascularization.
      ]. We identified one article that reported the concordance between RAND/UCLA appropriateness method panel results and subsequent randomized controlled trial results [
      • Shekelle P.G.
      • Chassin M.R.
      • Park R.E.
      Assessing the predictive validity of the RAND/UCLA appropriateness method criteria for performing carotid endarterectomy.
      ]. Three and 10 articles described the effect of varying panelist discipline [
      • Scott E.A.
      • Black N.
      Appropriateness of cholecystectomy: the public and private sectors compared.
      ,
      • Leape L.L.
      • Park R.E.
      • Kahan J.P.
      • Brook R.H.
      Group judgments of appropriateness: the effect of panel composition.
      ,
      • Scott E.A.
      • Black N.
      Appropriateness of cholecystectomy in the United Kingdom—a consensus panel approach.
      ] or nationality [
      • Froehlich F.
      • Pache I.
      • Burnand B.
      • Vader J.P.
      • Fried M.
      • Beglinger C.
      • et al.
      Performance of panel-based criteria to evaluate the appropriateness of colonoscopy: a prospective study.
      ,
      • Bernstein S.J.
      • Hofer T.P.
      • Meijler A.P.
      • Rigter H.
      Setting standards for effectiveness: a comparison of expert panels and decision analysis.
      ,
      • Bernstein S.J.
      • Lazaro P.
      • Fitch K.
      • Aguilar M.D.
      • Rigter H.
      • Kahan J.P.
      Appropriateness of coronary revascularization for patients with chronic stable angina or following an acute myocardial infarction: multinational versus Dutch criteria.
      ,
      • Vader J.P.
      • Porchet F.
      • Larequi-Lauber T.
      • Dubois R.W.
      • Burnand B.
      Appropriateness of surgery for sciatica: reliability of guidelines from expert panels.
      ,
      • Burnand B.
      • Vader J.P.
      • Froehlich F.
      • Dupriez K.
      • Larequi-Lauber T.
      • Pache I.
      • et al.
      Reliability of panel-based guidelines for colonoscopy: an international comparison.
      ,
      • Vader J.P.
      • Burnand B.
      • Froehlich F.
      • Dupriez K.
      • Larequi-Lauber T.
      • Pache I.
      • et al.
      Appropriateness of upper gastrointestinal endoscopy: comparison of American and Swiss criteria.
      ,
      • McGlynn E.A.
      • Naylor C.D.
      • Anderson G.M.
      • Leape L.L.
      • Park R.E.
      • Hilborne L.H.
      • et al.
      Comparison of the appropriateness of coronary angiography and coronary artery bypass graft surgery between Canada and New York State.
      ,
      • Fraser G.M.
      • Pilpel D.
      • Kosecoff J.
      • Brook R.H.
      Effect of panel composition on appropriateness ratings.
      ,
      • Bernstein S.J.
      • Kosecoff J.
      • Gray D.
      • Hampton J.R.
      • Brook R.H.
      The appropriateness of the use of cardiovascular procedures. British versus U.S. perspectives.
      ,
      • McDonnell J.
      • Stoevelaar H.J.
      • Bosch J.L.
      • Kahan J.P.
      The appropriateness of treatment of benign prostatic hyperplasia: a comparison of Dutch and multinational criteria.
      ], respectively, and four articles compared traditional panel results with those produced by panels conducted entirely by mail [
      • Escobar A.
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Guenaga J.I.
      • Arenaza J.C.
      • et al.
      Development of explicit criteria for total knee replacement.
      ,
      • Washington D.L.
      • Bernstein S.J.
      • Kahan J.P.
      • Leape L.L.
      • Kamberg C.J.
      • Shekelle P.G.
      Reliability of clinical guideline development using mail-only versus in-person expert panels.
      ,
      • Tobacman J.K.
      • Scott I.U.
      • Cyphert S.T.
      • Zimmerman M.B.
      Comparison of appropriateness ratings for cataract surgery between convened and mail-only multidisciplinary panels.
      ,
      • Tobacman J.K.
      • Scott I.U.
      • Cyphert S.
      • Zimmerman B.
      Reproducibility of measures of overuse of cataract surgery by three physician panels.
      ]. Articles reporting on the development of appropriateness criteria using the RAND/UCLA appropriateness method (16 articles) or on the application of appropriateness criteria to assess for appropriate use, overuse, and underuse (27 articles) have been summarized elsewhere [
      • Lawson E.H.
      • Gibbons M.M.
      • Ingraham A.M.
      • Shekelle P.G.
      • Ko C.Y.
      Appropriateness criteria to assess variations in surgical procedure use in the United States.
      ].
      Figure thumbnail gr1
      Fig. 1Flow diagram outlining the inclusion and exclusion of studies from the literature search. Some studies addressed more than one topic and/or procedure. CABG, coronary artery bypass grafting.

      3.2 Reliability of the RAND/UCLA appropriateness method

      The test–retest reliability of the RAND/UCLA appropriateness method has been studied for total knee replacement [
      • Escobar A.
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Guenaga J.I.
      • Arenaza J.C.
      • et al.
      Development of explicit criteria for total knee replacement.
      ], total hip joint replacement [
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Goenaga J.I.
      • Elexpe X.
      • Letona J.
      • et al.
      Evaluation of explicit criteria for total hip joint replacement.
      ], coronary artery bypass grafting (CABG) [
      • Hemingway H.
      • Crook A.M.
      • Dawson J.R.
      • Edelman J.
      • Edmondson S.
      • Feder G.
      • et al.
      Rating the appropriateness of coronary angiography, coronary angioplasty and coronary artery bypass grafting: the ACRE study. Appropriateness of Coronary Revascularisation study.
      ], and carotid endarterectomy (CEA) [
      • Merrick N.J.
      • Fink A.
      • Park R.E.
      • Brook R.H.
      • Kosecoff J.
      • Chassin M.R.
      • et al.
      Derivation of clinical indications for carotid endarterectomy by an expert panel.
      ] (Table 2). In each study, the same panelists rerated a portion of the original rated indications after a time interval of 6 months to 1 year. Indications chosen for rerating were either among the most frequently found indications in clinical practice or were randomly selected, and the proportion of the original indications that were rerated ranged from 2% to 25%. Three studies reported a weighted kappa between 0.64 and 0.78, indicating good agreement between the original and subsequent ratings by the panel [
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      ]. The fourth study reported correlation coefficients ranging from 0.75 to 0.96 between the original and subsequent ratings. No study reported the occurrence of complete discordance, in which an indication is first rated appropriate, then subsequently rated inappropriate, or vice versa.
      Table 2Studies on the reliability of the RAND/UCLA appropriateness method for surgical procedures
      Authors (year of publication)Procedure studiedMethodsIndications rated (% rerated)Weighted kappa (95% confidence interval)Complete discordance
      Test–retest with the same panelists
      Escobar et al. (2003)
      • Escobar A.
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Guenaga J.I.
      • Arenaza J.C.
      • et al.
      Development of explicit criteria for total knee replacement.
      Total knee replacementPanel rerated indications most frequently found in the study author's clinical practice 1 y later624 (25)0.78 (0.70–0.85)0
      Quintana et al. (2000)
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Goenaga J.I.
      • Elexpe X.
      • Letona J.
      • et al.
      Evaluation of explicit criteria for total hip joint replacement.
      Total hip joint replacementPanel rerated the most frequent appropriate, uncertain, and inappropriate indications found in a field study 1 y later216 (21)0.81 (0.68–0.95)0
      Hemingway et al. (1999)
      • Hemingway H.
      • Crook A.M.
      • Dawson J.R.
      • Edelman J.
      • Edmondson S.
      • Feder G.
      • et al.
      Rating the appropriateness of coronary angiography, coronary angioplasty and coronary artery bypass grafting: the ACRE study. Appropriateness of Coronary Revascularisation study.
      CABGPanel rerated a random selection of the original indications 1 y later84 (2)0.64NR
      Merrick et al. (1987)
      • Merrick N.J.
      • Fink A.
      • Park R.E.
      • Brook R.H.
      • Kosecoff J.
      • Chassin M.R.
      • et al.
      Derivation of clinical indications for carotid endarterectomy by an expert panel.
      CEAPanel rerated indications 6 mo later: 66 for the clinical presentation “Multiple TIAs, failure of medical treatment,” 33 randomly selected from 50 chosen as most frequently used in practice, and 33 from the remainder675 (20)Kappa NR; original and later repeated ratings had correlation coefficients ranging from 0.75 to 0.96NR
      Reproducibility with different panels keeping panelist discipline and nationality constant
      Three-way weighted kappaRates of agreement among panels (%)
      Shekelle et al. (1998)
      • Shekelle P.G.
      • Kahan J.P.
      • Bernstein S.J.
      • Leape L.L.
      • Kamberg C.J.
      • Park R.E.
      The reproducibility of a method to identify the overuse and underuse of medical procedures.
      CABGParallel three-way replication of the appropriateness panel process948 (100)Overuse indications
      0.5295, 94, 96
      Underuse indications
      0.8393, 92, 92
      HysterectomyParallel three-way replication of the appropriateness panel process1,718 (100)Overuse indications
      0.5188, 70, 74
      Abbreviations: RAND/UCLA, RAND/University of California Los Angeles; CABG, coronary artery bypass grafting; CEA, carotid endarterectomy; NR, not reported.
      Complete discordance=indication rated appropriate by one panel and inappropriate by the other.
      The reproducibility of RAND/UCLA appropriateness method results between different panels with the same makeup of panelist discipline and nationality was reported by one article focused on CABG and hysterectomy [
      • Shekelle P.G.
      • Kahan J.P.
      • Bernstein S.J.
      • Leape L.L.
      • Kamberg C.J.
      • Park R.E.
      The reproducibility of a method to identify the overuse and underuse of medical procedures.
      ] (Table 2). For each procedure, three parallel panels were assembled in an experimental fashion. Each panel followed the RAND/UCLA appropriateness method and rated the same indications (948 for CABG and 1,718 for hysterectomy). The three-way weighted kappa was 0.52 for CABG overuse indications, 0.83 for CABG underuse indications, and 0.51 for hysterectomy overuse indications.

      3.3 Construct validity of the RAND/UCLA appropriateness method

      We identified eight studies that classified the appropriateness of a surgical procedure for actual patients using both appropriateness criteria developed with the RAND/UCLA appropriateness method and published guidelines developed using other methods, such as consensus conferences (Table 3). The European studies focused on upper gastrointestinal (GI) endoscopy [
      • Kaliszan B.
      • Soule J.C.
      • Vallot T.
      • Mignon M.
      Applicability and efficacy of qualifying criteria for an appropriate use of diagnostic upper gastrointestinal endoscopy.
      ,
      • Bersani G.
      • Rossi A.
      • Suzzi A.
      • Ricci G.
      • De Fabritiis G.
      • Alvisi V.
      Comparison between the two systems to evaluate the appropriateness of endoscopy of the upper digestive tract.
      ] or colonoscopy [
      • Froehlich F.
      • Pache I.
      • Burnand B.
      • Vader J.P.
      • Fried M.
      • Beglinger C.
      • et al.
      Performance of panel-based criteria to evaluate the appropriateness of colonoscopy: a prospective study.
      ], whereas the US studies looked at CABG [
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      ,
      • Leape L.L.
      • Weissman J.S.
      • Schneider E.C.
      • Piana R.N.
      • Gatsonis C.
      • Epstein A.M.
      Adherence to practice guidelines: the role of specialty society guidelines.
      ,
      • Ziskind A.A.
      • Lauer M.A.
      • Bishop G.
      • Vogel R.A.
      Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
      ], hysterectomy [
      • Broder M.S.
      • Kanouse D.E.
      • Mittman B.S.
      • Bernstein S.J.
      The appropriateness of recommendations for hysterectomy.
      ], and upper GI endoscopy [
      • Kahn K.L.
      • Park R.E.
      • Vennes J.
      • Brook R.H.
      Assigning appropriateness ratings for diagnostic upper gastrointestinal endoscopy using two different approaches.
      ]. The number of patients studied varied from 153 to 5,026, whereas the number of patients able to be classified by both systems varied from 71 to 2,000. Of patients classified by both systems, the rates of necessary, appropriate, equivocal, and inappropriate indications were similar. For example, Leape et al. classified 676 patients who underwent CABG and reported rates of appropriate and inappropriate use of 76% and 9% by appropriateness criteria and 84% and 1.5% by American College of Cardiology/American Hospital Association guidelines. Kahn et al. classified 1,115 patients who underwent upper GI endoscopy and reported rates of appropriate and inappropriate use of 90.1% and 6.7% by appropriateness criteria and 93.5% and 3.7% by guidelines. Appropriateness criteria and guidelines thus appear to be measuring the same construct. Four of these studies found that appropriateness criteria were able to classify more patients than guidelines, whereas two European studies on upper GI endoscopy found that guidelines classified more patients, and two US studies on CABG found that all patients were classifiable by both methods.
      Table 3Construct validity: studies comparing appropriateness criteria with guideline recommendations for surgical procedures
      Authors (year of publication)Procedure studiedMethodsConstruct comparisonPercent of patients classifiableClassification of patients (%)
      Only among patients able to be classified by both systems, unless specified.
      Percent agreement
      Only among patients able to be classified by both systems, unless specified.
      (number of patients rated)
      NecessaryAppropriateEquivocalInappropriate
      Kaliszan et al. (2006)
      • Kaliszan B.
      • Soule J.C.
      • Vallot T.
      • Mignon M.
      Applicability and efficacy of qualifying criteria for an appropriate use of diagnostic upper gastrointestinal endoscopy.
      Upper GI endoscopy522 patients prospectively classified using both methodsEPAGE appropriateness criteria70.763.010.726.390.4 (346)
      ANAES guidelines80.778.621.4
      Bersani et al. (2004)
      • Bersani G.
      • Rossi A.
      • Suzzi A.
      • Ricci G.
      • De Fabritiis G.
      • Alvisi V.
      Comparison between the two systems to evaluate the appropriateness of endoscopy of the upper digestive tract.
      Upper GI endoscopy2,300 patients prospectively classified using both methodsEPAGE appropriateness criteria87.070.310.219.5NR (2,000)
      ASGE guidelines10089.810.2
      Epstein et al. (2003)
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      CABG/PTCA5,026 coronary angiography patients retrospectively classified using both methods (stratified random sample)RAND appropriateness criteria10030.430.7NR
      ACC/AHA guidelines85.540.8
      A total of 14.5% of patients were not classifiable.
      28.9
      A total of 14.5% of patients were not classifiable.
      Leape et al. (2003)
      • Leape L.L.
      • Weissman J.S.
      • Schneider E.C.
      • Piana R.N.
      • Gatsonis C.
      • Epstein A.M.
      Adherence to practice guidelines: the role of specialty society guidelines.
      CABG676 CABG patients retrospectively classified using both methods (stratified random sample)RAND appropriateness criteria10076159NR
      ACC/AHA guidelines10084151.5
      Broder et al. (2000)
      • Broder M.S.
      • Kanouse D.E.
      • Mittman B.S.
      • Bernstein S.J.
      The appropriateness of recommendations for hysterectomy.
      Hysterectomy497 patients retrospectively classified by both methodsRAND appropriateness criteria10053.5NR (n=71)
      ACOG guidelines14.376.1
      Ziskind et al. (1999)
      • Ziskind A.A.
      • Lauer M.A.
      • Bishop G.
      • Vogel R.A.
      Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
      CABG153 patients prospectively classified using both methodsRAND appropriateness criteria100291242
      ACC/AHA guidelines100331770 (n=153)
      University of Maryland RAS1004864682 (n=153)
      Froehlich et al. (1998)
      • Froehlich F.
      • Pache I.
      • Burnand B.
      • Vader J.P.
      • Fried M.
      • Beglinger C.
      • et al.
      Performance of panel-based criteria to evaluate the appropriateness of colonoscopy: a prospective study.
      Colonoscopy553 patients prospectively classified using both methodsVHS appropriateness criteria (United States)97.672.4 (appropriate or equivocal)27.6NR (n=395)
      VHS/RAND appropriateness criteria (Swiss)97.882.0 (appropriate or equivocal)18NR (n=395)
      ASGE guidelines71.672.227.8NR
      Kahn et al. (1992)
      • Kahn K.L.
      • Park R.E.
      • Vennes J.
      • Brook R.H.
      Assigning appropriateness ratings for diagnostic upper gastrointestinal endoscopy using two different approaches.
      Upper GI endoscopy1,585 patients retrospectively classified using both methods (random sample)RAND appropriateness criteria10090.13.16.7
      • 94.2% agreement; 3.2% complete discordance (P<0.0001); kappa=0.63 (P<0.0001) (n=1,115)
      ASGE guidelines7093.52.83.7
      Abbreviations: GI, gastrointestinal; EPAGE, European Panel on the Appropriateness of Gastrointestinal Endoscopy; ANAES, Agenee Nationale d'Accreditation et d'Evaluation en Sante (French working group); ASGE, American Society of Gastrointestinal Endoscopy; NR, not reported; CABG, coronary artery bypass grafting; PTCA, percutaneous transluminal coronary angioplasty; ACC/AHA, American College of Cardiology/American Hospital Association; ACOG, American College of Obstetrics and Gynecologists; RAS, Revascularization Appropriateness Score; VHS, Value Health Sciences.
      a Only among patients able to be classified by both systems, unless specified.
      b A total of 14.5% of patients were not classifiable.
      Construct validity has also been studied by comparing RAND/UCLA appropriateness method ratings with published evidence from trials and observational studies. In a study by Merrick et al. [
      • Merrick N.J.
      • Fink A.
      • Park R.E.
      • Brook R.H.
      • Kosecoff J.
      • Chassin M.R.
      • et al.
      Derivation of clinical indications for carotid endarterectomy by an expert panel.
      ], 675 indications for CEA were classified using the appropriateness method and then placed in rank order. These same indications were also ranked by the percent of recommendations for surgery found in the literature. The study authors reported that the pattern of ratings assigned by the panelists and the rank ordering were nearly identical to endorsement patterns found in the literature. Two European studies compared classification of colonoscopy indications using appropriateness criteria vs. published literature and reported weighted kappas of 0.63 (48 indications) and 0.29 (95 indications), respectively. Complete discordance occurred for 6.3% and 7.4% of indications. A comparison with US appropriateness criteria resulted in a kappa of 0.74 (47 indications). One study attempted to classify appropriateness for 577 actual patients who underwent colonoscopy and reported that only 9% of patients were classifiable by both methods [
      • Bridevaux I.P.
      • Silaghi A.M.
      • Vader J.P.
      • Froehlich F.
      • Gonvers J.J.
      • Burnand B.
      Appropriateness of colorectal cancer screening: appraisal of evidence by experts.
      ,
      • Nicollier-Fahrni A.
      • Vader J.P.
      • Froehlich F.
      • Gonvers J.J.
      • Burnand B.
      Development of appropriateness criteria for colonoscopy: comparison between a standardized expert panel and an evidence-based medicine approach.
      ].
      Finally, the RAND/UCLA appropriateness method has been compared with quantitative methods for classifying indications as appropriate or inappropriate for a surgical procedure based on predictions of the ratio of risks and benefits. The results of these studies are mixed. A study comparing classification using appropriateness criteria vs. probability estimation based on the same panel's assessment of the effect of abdominal aorta aneurysm surgery on the probability of 5-year survival reported a kappa of 0.28 [
      • Silverstein M.D.
      • Ballard D.J.
      Expert panel assessment of appropriateness of abdominal aortic aneurysm surgery: global judgement versus probability estimation.
      ]. Another study used a similar method for CEA and reported that the Spearman rank order correlations were significant and positive for only two of the eight panelists, with correlations ranging from 0.45 to 0.38 [
      • McClellan M.
      • Brook R.H.
      Appropriateness of care. A comparison of global and outcome methods to set standards.
      ]. In contrast, a study that compared the RAND/UCLA appropriateness method with a decision model developed by asking panelists to estimate probabilities and utilities for CEA indications reported a Spearman correlation coefficient of 0.8 [
      • Oddone E.Z.
      • Samsa G.
      • Matchar D.B.
      Global judgments versus decision-model-facilitated judgments: are experts internally consistent?.
      ]. Finally, one study classified the appropriateness of CABG for 617 US patients and 1,053 Dutch patients using US appropriateness criteria, Dutch appropriateness criteria, and a decision analytic model built using data from randomized controlled trials and best judgment data. The authors report a kappa of 0.18 for the US appropriateness criteria vs. the model and a kappa of 0.79 for the Dutch appropriateness criteria vs. the model [
      • Bernstein S.J.
      • Hofer T.P.
      • Meijler A.P.
      • Rigter H.
      Setting standards for effectiveness: a comparison of expert panels and decision analysis.
      ].

      3.4 Predictive validity of the RAND/UCLA appropriateness method

      We identified four studies on the predictive validity of the RAND/UCLA appropriateness method, all of which focused on cardiac revascularization (Table 4). All four studies were observational and examined the association between care that is concordant or discordant with appropriateness criteria classification and outcomes in patients who did and did not receive revascularization after coronary angiography. We did not identify any randomized controlled trials studying predictive validity, likely because of the ethical challenges to randomizing patients to discordant care. Two of the included studies examined mortality rates for patients classified as necessary for revascularization and found significantly lower mortality for patients who underwent revascularization compared with those who were managed medically (9% vs. 19% and 9.1% vs. 23.3%, both P<0.05) [
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      ,
      • Kravitz R.L.
      • Laouri M.
      • Kahan J.P.
      • Guzy P.
      • Sherman T.
      • Hilborne L.
      • et al.
      Validity of criteria used for detecting underuse of coronary revascularization.
      ]. Additionally, one study reported that mortality was significantly lower for patients classified as inappropriate for CABG and did not undergo the procedure compared with those who nonetheless underwent CABG (11.9% vs. 20%, P<0.05) [
      • Hemingway H.
      • Crook A.M.
      • Feder G.
      • Banerjee S.
      • Dawson J.R.
      • Magee P.
      • et al.
      Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization.
      ], whereas another study reported a nonsignificant trend (3% vs. 6%) [
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      ]. A large prospective study reported that patients appropriate for CABG who did not receive the procedure were more likely to have angina at follow-up (odds ratio, 3.03; 95% confidence interval, 2.08–4.42) than those who underwent revascularization and that the risk of death or nonfatal myocardial infarction at 2-year follow-up was 21% compared with 6% (P<0.001). Furthermore, the study authors described a graded relationship between rating and outcome over the entire scale of appropriateness (linear trend P=0.002) [
      • Hemingway H.
      • Crook A.M.
      • Feder G.
      • Banerjee S.
      • Dawson J.R.
      • Magee P.
      • et al.
      Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization.
      ]. We did identify one study that found no association between concordant vs. discordant care and mortality; however, this study had a smaller sample size than the other three (153 patients vs. 2,552–5,026 patients) [
      • Ziskind A.A.
      • Lauer M.A.
      • Bishop G.
      • Vogel R.A.
      Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
      ].
      Table 4Studies on the predictive validity of appropriateness criteria for cardiac revascularization
      Authors (year of publication)PatientsAppropriateness criteria classificationNumber of patients with treatment concordant with appropriateness criteria classification and (% mortality)Number of patients with treatment discordant with appropriateness criteria classification and (% mortality)P-value
      Epstein et al. (2003)
      • Epstein A.M.
      • Weissman J.S.
      • Schneider E.C.
      • Gatsonis C.
      • Leape L.L.
      • Piana R.N.
      Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
      5,026 coronary angiography patients retrospectively classified (stratified random sample)Necessary for revascularizationn = 1,057 (9%)n = 469 (19%)P < 0.01
      Inappropriate for revascularizationn = 1,425 (3%)n = 116 (6%)P = 0.17
      Hemingway et al. (2001)
      • Hemingway H.
      • Crook A.M.
      • Feder G.
      • Banerjee S.
      • Dawson J.R.
      • Magee P.
      • et al.
      Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization.
      2,552 coronary angiography patients prospectively classifiedAppropriate for CABGn = 765 (5.5%)n = 354 (19.2%)
      Inappropriate for CABGn = 109 (11.9%)n = 15 (20%)
      Ziskind et al. (1999)
      • Ziskind A.A.
      • Lauer M.A.
      • Bishop G.
      • Vogel R.A.
      Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
      153 coronary angiography patients prospectively classifiedAppropriate or inappropriate for revascularizationn = 84 (10%)n = 38 (13%)P > 0.05
      Kravitz et al. (1995)
      • Kravitz R.L.
      • Laouri M.
      • Kahan J.P.
      • Guzy P.
      • Sherman T.
      • Hilborne L.
      • et al.
      Validity of criteria used for detecting underuse of coronary revascularization.
      4,226 coronary angiography patients retrospectively classified (random sample)Necessary for CABGn = 248 (9.7%)n = 108 (16.7%)P = 0.04
      Necessary for revascularizationn = 110 (9.1%)n = 30 (23.3%)P = 0.03
      Abbreviation: CABG, coronary artery bypass grafting.
      Revascularization=CABG or percutaneous transluminal coronary angioplasty; Discordant=procedure necessary or appropriate but patient treated medically or received the procedure classified as inappropriate.
      We identified one study that compared the results of previously developed appropriateness criteria for CEA with the results of subsequently published randomized controlled trials [
      • Shekelle P.G.
      • Chassin M.R.
      • Park R.E.
      Assessing the predictive validity of the RAND/UCLA appropriateness method criteria for performing carotid endarterectomy.
      ]. The results of the randomized controlled trials were concordant with the appropriateness ratings previously produced by a panel for 44 indications, which together covered almost 30% of operations performed in 1981, when the panel was conducted. Furthermore, no indication ratings were refuted by the subsequent randomized controlled trial results.

      3.5 Effect of varying panelist discipline and/or nationality

      We identified two studies that compared the results of multidisciplinary panels (recommended by the RAND/UCLA appropriateness method) with those produced by all-surgeon panels. An experimental study focused on cholecystectomy found that the all-surgeon panel rated more indications appropriate (29% vs. 13%) and less inappropriate (27% vs. 50%) than the multidisciplinary panel. The study authors noted that the multidisciplinary panelists were more likely to change their ratings in the second round (after discussion of the first round results) than the surgeons, and the changes made were more substantial [
      • Scott E.A.
      • Black N.
      Appropriateness of cholecystectomy in the United Kingdom—a consensus panel approach.
      ]. An observational study focused on CEA had similar findings with the all-surgeon panel rating 24% of indications appropriate and 61% inappropriate, whereas the multidisciplinary panel rated 14% appropriate and 70% inappropriate (P<0.01) [
      • Leape L.L.
      • Park R.E.
      • Kahan J.P.
      • Brook R.H.
      Group judgments of appropriateness: the effect of panel composition.
      ]. When applied to actual populations of patients, the multidisciplinary panel ratings classified fewer patients as appropriate and more as inappropriate for both cholecystectomy and CEA compared with the all-surgeon panel ratings [
      • Scott E.A.
      • Black N.
      Appropriateness of cholecystectomy: the public and private sectors compared.
      ,
      • Leape L.L.
      • Park R.E.
      • Kahan J.P.
      • Brook R.H.
      Group judgments of appropriateness: the effect of panel composition.
      ,
      • Scott E.A.
      • Black N.
      Appropriateness of cholecystectomy in the United Kingdom—a consensus panel approach.
      ].
      There is a large volume of literature comparing rating results between panels with different compositions in terms of panelist nationality. These studies compared appropriateness criteria developed using the RAND/UCLA appropriateness method by American, Canadian, and European panels. In general, these studies reported that panel results vary modestly but significantly [
      • Froehlich F.
      • Pache I.
      • Burnand B.
      • Vader J.P.
      • Fried M.
      • Beglinger C.
      • et al.
      Performance of panel-based criteria to evaluate the appropriateness of colonoscopy: a prospective study.
      ,
      • Bernstein S.J.
      • Hofer T.P.
      • Meijler A.P.
      • Rigter H.
      Setting standards for effectiveness: a comparison of expert panels and decision analysis.
      ,
      • Bernstein S.J.
      • Lazaro P.
      • Fitch K.
      • Aguilar M.D.
      • Rigter H.
      • Kahan J.P.
      Appropriateness of coronary revascularization for patients with chronic stable angina or following an acute myocardial infarction: multinational versus Dutch criteria.
      ,
      • Vader J.P.
      • Porchet F.
      • Larequi-Lauber T.
      • Dubois R.W.
      • Burnand B.
      Appropriateness of surgery for sciatica: reliability of guidelines from expert panels.
      ,
      • Burnand B.
      • Vader J.P.
      • Froehlich F.
      • Dupriez K.
      • Larequi-Lauber T.
      • Pache I.
      • et al.
      Reliability of panel-based guidelines for colonoscopy: an international comparison.
      ,
      • Vader J.P.
      • Burnand B.
      • Froehlich F.
      • Dupriez K.
      • Larequi-Lauber T.
      • Pache I.
      • et al.
      Appropriateness of upper gastrointestinal endoscopy: comparison of American and Swiss criteria.
      ,
      • McGlynn E.A.
      • Naylor C.D.
      • Anderson G.M.
      • Leape L.L.
      • Park R.E.
      • Hilborne L.H.
      • et al.
      Comparison of the appropriateness of coronary angiography and coronary artery bypass graft surgery between Canada and New York State.
      ,
      • Fraser G.M.
      • Pilpel D.
      • Kosecoff J.
      • Brook R.H.
      Effect of panel composition on appropriateness ratings.
      ,
      • Bernstein S.J.
      • Kosecoff J.
      • Gray D.
      • Hampton J.R.
      • Brook R.H.
      The appropriateness of the use of cardiovascular procedures. British versus U.S. perspectives.
      ,
      • McDonnell J.
      • Stoevelaar H.J.
      • Bosch J.L.
      • Kahan J.P.
      The appropriateness of treatment of benign prostatic hyperplasia: a comparison of Dutch and multinational criteria.
      ]. Similarly, there is a large literature comparing the differences in ratings of appropriateness within a single panel as a function of panelist discipline. Almost universally, these studies show that different disciplines systematically rate differently the appropriateness of doing procedures, with those who perform the procedure rating appropriateness higher than those who do not [
      • Fitch K.
      • Lazaro P.
      • Aguilar M.D.
      • Martin Y.
      • Bernstein S.J.
      Physician recommendations for coronary revascularization: variations by clinical speciality.
      ,
      • Kahan J.P.
      • Park R.E.
      • Leape L.L.
      • Bernstein S.J.
      • Hilborne L.H.
      • Parker L.
      • et al.
      Variations by specialty in physician ratings of the appropriateness and necessity of indications for procedures.
      ]. These results support the conclusions of the aforementioned studies that panelist discipline and nationality influence ratings of appropriateness.

      3.6 Effect of eliminating in-person discussion

      The RAND/UCLA appropriateness method includes at least two rounds of rating by the individual panelists for each indication, with the second round occurring after an in-person discussion of the first round results. We found three studies that compared ratings produced by the traditional RAND/UCLA appropriateness method with those produced by panels that used the appropriateness method but omitted the in-person discussion between rounds (i.e., the process was conducted entirely by mail). One experimental study compared a mail-in-only panel with three in-person panels and reported kappas of 0.49–0.67 for CABG overuse indications, 0.69–0.76 for CABG underuse indications, and 0.59–0.69 for hysterectomy overuse indications [
      • Washington D.L.
      • Bernstein S.J.
      • Kahan J.P.
      • Leape L.L.
      • Kamberg C.J.
      • Shekelle P.G.
      Reliability of clinical guideline development using mail-only versus in-person expert panels.
      ]. Two experimental studies focused on total knee replacement [
      • Escobar A.
      • Quintana J.M.
      • Arostegui I.
      • Azkarate J.
      • Guenaga J.I.
      • Arenaza J.C.
      • et al.
      Development of explicit criteria for total knee replacement.
      ] and cataract extraction [
      • Tobacman J.K.
      • Scott I.U.
      • Cyphert S.T.
      • Zimmerman M.B.
      Comparison of appropriateness ratings for cataract surgery between convened and mail-only multidisciplinary panels.
      ] and reported weighted kappas of 0.75 and 0.65, respectively. We also identified a study that applied the cataract extraction appropriateness criteria to 1,020 actual patients and found that the mail-in-only panel ratings classified 70% as appropriate and 3.5% inappropriate compared with 92% as appropriate and 2% as inappropriate by the in-person panel ratings (P<0.001) [
      • Tobacman J.K.
      • Scott I.U.
      • Cyphert S.
      • Zimmerman B.
      Reproducibility of measures of overuse of cataract surgery by three physician panels.
      ].

      4. Discussion

      Our review summarizes the methodological research on use of the RAND/UCLA appropriateness method for surgical procedures. We found that the appropriateness method has moderate to very good reliability for determining overuse and underuse for total knee and hip joint replacement, CABG, CEA, and hysterectomy. Additionally, studies on construct validity demonstrate that appropriateness criteria developed using the RAND/UCLA appropriateness method, professional society guidelines, and evidence-based approaches are likely measuring similar constructs for overuse, underuse, and appropriate use of upper GI endoscopy, colonoscopy, CABG, hysterectomy, and CEA. However, appropriateness criteria is more often applicable to a wider range of patients than are professional society guidelines and other evidence-based approaches. Perhaps most important, though, is our finding that studies support the predictive validity of appropriateness criteria for cardiac revascularization, meaning that patients who receive treatment concordant with appropriateness criteria classification have better outcomes than those who receive discordant care. Although these findings are encouraging, they also highlight the need for further methodological research on a broader range of procedures.
      The results of studies on the reliability of the RAND/UCLA appropriateness method can be put in perspective by considering appropriateness criteria in the context of a diagnostic test. This interpretation is logical because appropriateness criteria is not meant to replace a clinicians' judgment but rather to serve as a supplement in the decision-making process, much as laboratory and radiological studies aid in the formation of diagnoses and treatment plans. Although the reliability of the appropriateness method for surgical procedures is not perfect, it is certainly within the range seen with commonly used diagnostic tests. For example, the reliability of coronary angiography for determining the presence or absence of stenosis is reported to be moderate (kappa, 0.53) [
      • DeRouen T.A.
      • Murray J.A.
      • Owen W.
      Variability in the analysis of coronary arteriograms.
      ], and the reliability for determining stenosis length and lesion eccentricity is fair (kappa, 0.38 and 0.25, respectively) [
      • Ellis S.
      • Alderman E.L.
      • Cain K.
      • Wright A.
      • Bourassa M.
      • Fisher L.
      Morphology of left anterior descending coronary territory lesions as a predictor of anterior myocardial infarction: a CASS Registry Study.
      ]. The reliability of the RAND/UCLA appropriateness method panel process may also be better than that of individual surgeons' decisions. When this was studied for hysterectomy, the reliability of individual surgeons' decisions was fair (kappa, 0.23) [
      • Rutkow I.M.
      • Gittelsohn A.M.
      • Zuidema G.D.
      Surgical decision making. The reliability of clinical judgment.
      ].
      Other methods of determining the appropriateness of a surgical procedure for a particular patient include guidelines produced by professional societies, evidence-based approaches using published literature, and quantitative approaches, such as decision analysis and probability estimation. There is much less published evidence on the reliability and validity of these methods. In general, studies comparing the RAND/UCLA appropriateness method with guidelines or published evidence found similar results, thus supporting the construct validity of the appropriateness method. The disadvantage of guidelines is that they tend to offer general recommendations and are thus often not actionable at the individual level. Ideally, an evidence-based approach would be taken for each patient; however, conducting randomized controlled trials for every possible clinical scenario is not realistic and may not even be ethical in some circumstances. In contrast, the RAND/UCLA appropriateness method is designed to be comprehensive and exhaustive, producing appropriateness criteria that can be applied to the vast majority of patients presenting for a procedure. Decision models and probability estimation may be similarly comprehensive and actionable. As with the appropriateness method, these methods rely on a synthesis of published evidence and clinical judgment. We found relatively few studies comparing the RAND/UCLA appropriateness method with these methods and believe that this is an area in need of further study.
      There are a number of ways in which appropriateness criteria could be implemented to reduce variation in the use of surgical procedures. For example, in response to allegations of inappropriate use of percutaneous coronary intervention in Maryland hospitals, the American College of Cardiology and the Society for Cardiovascular Angiography and Interventions recently proposed a mandatory accrediting process through the Accreditation for Cardiovascular Excellence (ACE) program. Along with various process and outcome standards, a key component of the ACE program is documentation of the indication for percutaneous coronary intervention and assessment of appropriateness using the current Appropriate Use Criteria for Coronary Artery Revascularization [
      • Brindis R.
      • Goldberg S.D.
      • Turco M.A.
      • Dean L.S.
      President's page: quality and appropriateness of care: the response to allegations and actions needed by the cardiovascular professional.
      ,

      ACE Standards for Catherization Laboratory Accreditation. Accreditation for Cardiovascular Excellence 2011. Available at: http://www.cvexcel.org/CathPCI/Standards.aspx.

      ]. Appropriateness criteria could also be used by physicians for clinical decision support or implemented in the clinical realm as part of the preoperative informed consent process, with patients receiving individualized assessments of the appropriateness of the procedure for their particular clinical scenario. For example, a proof of concept of the utility of appropriateness criteria for changing physician behavior and reducing practice variations was performed in the United Kingdom using criteria for coronary angiography [
      • Junghans C.
      • Feder G.
      • Timmis A.D.
      • Eldridge S.
      • Sekhri N.
      • Black N.
      • et al.
      Effect of patient-specific ratings vs conventional guidelines on investigation decisions in angina: appropriateness of Referral and Investigation in Angina (ARIA) Trial.
      ]. These uses would be consistent with recent quality measures proposed by the Centers for Medicare and Medicaid Services for Accountable Care Organizations, which include a measure of the percentage of physicians using clinical decision support and a measure of shared decision-making between physician and patient.
      Our study has possible limitations. First, methodological studies on the RAND/UCLA appropriateness method for surgical procedures may have been overlooked. To reduce this possibility, we started with broad search terms and had two physicians to perform the screening and reference mining. Second, the implications of our findings may be limited by the possibility that the RAND/UCLA appropriateness method may have differing reliability and validity for different procedures. Most of the methodological studies on the RAND/UCLA appropriateness method focus on a relatively small number of procedures. Further evaluation of the method is warranted and could be performed concurrently with future development, application, and implementation of appropriateness criteria for a broad range of procedures. Finally, the developers of the RAND/UCLA appropriateness method performed many of the methodological studies we identified, which may be a source of bias. Of note, however, a key study on the predictive validity of the RAND/UCLA appropriateness method for coronary revascularization was performed completely independent of the appropriateness method developers [
      • Hemingway H.
      • Crook A.M.
      • Feder G.
      • Banerjee S.
      • Dawson J.R.
      • Magee P.
      • et al.
      Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization.
      ].
      Ensuring that patients receive appropriate surgical care should be considered integral to improving the overall quality of the health care system. Our study supports the use of the RAND/UCLA appropriateness method to assess variation in the use of surgical procedures by identifying overuse and underuse and highlights the need for further methodological research as appropriateness criteria is developed and implemented for a broader range of procedures.

      Acknowledgments

      The authors acknowledge Dr Angela Ingraham for her help with the initial screening of articles. This study received no external funding. Dr Lawson's time was supported by the American College of Surgeons through RWJF CSP. The remaining three authors participated through their roles as advisors to the RWJF CSP.

      References

        • Jha A.K.
        • Fisher E.S.
        • Li Z.
        • Orav E.J.
        • Epstein A.M.
        Racial trends in the use of major procedures among the elderly.
        N Engl J Med. 2005; 353: 683-691
        • Weinstein J.N.
        • Bronner K.K.
        • Morgan T.S.
        • Wennberg J.E.
        Trends and geographic variations in major surgery for degenerative diseases of the hip, knee, and spine.
        Health Affairs. 2004; https://doi.org/10.1377/hlthaff.var.81
        • Patel M.R.
        • Greiner M.A.
        • DiMartino L.D.
        • Schulman K.A.
        • Duncan P.W.
        • Matchar D.B.
        • et al.
        Geographic variation in carotid revascularization among Medicare beneficiaries, 2003-2006.
        Arch Intern Med. 2010; 170: 1218-1225
      1. Wennberg J.E. The Dartmouth atlas of health care. The Center for the Evaluative Clinical Sciences at Dartmouth Medical School in association with American Hospital Publishing, Inc, Hanover, NH1998
        • Chassin M.R.
        Explaining geographic variations. The enthusiasm hypothesis.
        Med Care. 1993; 31: YS37-YS44
        • Wright J.G.
        • Hawker G.A.
        • Bombardier C.
        • Croxford R.
        • Dittus R.S.
        • Freund D.A.
        • et al.
        Physician enthusiasm as an explanation for area variation in the utilization of knee replacement surgery.
        Med Care. 1999; 37 ([Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, P.H.S.]): 946-956
        • Fitch K.
        • Bernstein S.
        • Aguilar M.
        • Burnand B.
        • LaCalle J.R.
        • Lazaro P.
        • et al.
        The RAND/UCLA appropriateness method user's manual 2001. No. MR-1269-DG-XII/RE:126.
        RAND Corp, Santa Monica, CA2001
        • Hicks N.R.
        Some observations on attempts to measure appropriateness of care.
        BMJ. 1994; 309: 730-733
        • Brook R.H.
        • Chassin M.R.
        • Fink A.
        • Solomon D.H.
        • Kosecoff J.
        • Park R.E.
        A method for the detailed assessment of the appropriateness of medical technologies.
        Int J Technol Assess Health Care. 1986; 2: 53-63
        • Yermilov I.
        • McGory M.L.
        • Shekelle P.W.
        • Ko C.Y.
        • Maggard M.A.
        Appropriateness criteria for bariatric surgery: beyond the NIH guidelines.
        Obesity (Silver Spring). 2009; 17: 1521-1527
        • Patel M.R.
        • Dehmer G.J.
        • Hirshfeld J.W.
        • Smith P.K.
        • Spertus J.A.
        ACCF/SCAI/STS/AATS/AHA/ASNC 2009 Appropriateness Criteria for Coronary Revascularization: a Report by the American College of Cardiology Foundation Appropriateness Criteria Task Force, Society for Cardiovascular Angiography and Interventions, Society of Thoracic Surgeons, American Association for Thoracic Surgery, American Heart Association, and the American Society of Nuclear Cardiology Endorsed by the American Society of Echocardiography, the Heart Failure Society of America, and the Society of Cardiovascular Computed Tomography.
        J Am Coll Cardiol. 2009; 53: 530-553
        • Shojania K.G.
        • Sampson M.
        • Ansari M.T.
        • Ji J.
        • Doucette S.
        • Moher D.
        How quickly do systematic reviews go out of date? A survival analysis.
        Ann Intern Med. 2007; 147: 224-233
        • Escobar A.
        • Quintana J.M.
        • Arostegui I.
        • Azkarate J.
        • Guenaga J.I.
        • Arenaza J.C.
        • et al.
        Development of explicit criteria for total knee replacement.
        Int J Technol Assess Health Care. 2003; 19 (Winter): 57-70
        • Hemingway H.
        • Crook A.M.
        • Dawson J.R.
        • Edelman J.
        • Edmondson S.
        • Feder G.
        • et al.
        Rating the appropriateness of coronary angiography, coronary angioplasty and coronary artery bypass grafting: the ACRE study. Appropriateness of Coronary Revascularisation study.
        J Public Health Med. 1999; 21: 421-429
        • Quintana J.M.
        • Arostegui I.
        • Azkarate J.
        • Goenaga J.I.
        • Elexpe X.
        • Letona J.
        • et al.
        Evaluation of explicit criteria for total hip joint replacement.
        J Clin Epidemiol. 2000; 53: 1200-1208
        • Merrick N.J.
        • Fink A.
        • Park R.E.
        • Brook R.H.
        • Kosecoff J.
        • Chassin M.R.
        • et al.
        Derivation of clinical indications for carotid endarterectomy by an expert panel.
        Am J Public Health. 1987; 77: 187-190
        • Shekelle P.G.
        • Kahan J.P.
        • Bernstein S.J.
        • Leape L.L.
        • Kamberg C.J.
        • Park R.E.
        The reproducibility of a method to identify the overuse and underuse of medical procedures.
        N Engl J Med. 1998; 338: 1888-1895
        • Kaliszan B.
        • Soule J.C.
        • Vallot T.
        • Mignon M.
        Applicability and efficacy of qualifying criteria for an appropriate use of diagnostic upper gastrointestinal endoscopy.
        Gastroenterol Clin Biol. 2006; 30: 673-680
        • Bersani G.
        • Rossi A.
        • Suzzi A.
        • Ricci G.
        • De Fabritiis G.
        • Alvisi V.
        Comparison between the two systems to evaluate the appropriateness of endoscopy of the upper digestive tract.
        Am J Gastroenterol. 2004; 99: 2128-2135
        • Epstein A.M.
        • Weissman J.S.
        • Schneider E.C.
        • Gatsonis C.
        • Leape L.L.
        • Piana R.N.
        Race and gender disparities in rates of cardiac revascularization: do they reflect appropriate use of procedures or problems in quality of care?.
        Med Care. 2003; 41: 1240-1255
        • Leape L.L.
        • Weissman J.S.
        • Schneider E.C.
        • Piana R.N.
        • Gatsonis C.
        • Epstein A.M.
        Adherence to practice guidelines: the role of specialty society guidelines.
        Am Heart J. 2003; 145: 19-26
        • Broder M.S.
        • Kanouse D.E.
        • Mittman B.S.
        • Bernstein S.J.
        The appropriateness of recommendations for hysterectomy.
        Obstet Gynecol. 2000; 95: 199-205
        • Ziskind A.A.
        • Lauer M.A.
        • Bishop G.
        • Vogel R.A.
        Assessing the appropriateness of coronary revascularization: the University of Maryland Revascularization Appropriateness Score (RAS) and its comparison to RAND expert panel ratings and American College of Cardiology/American Heart Association guidelines with regard to assigned appropriateness rating and ability to predict outcome.
        Clin Cardiol. 1999; 22: 67-76
        • Froehlich F.
        • Pache I.
        • Burnand B.
        • Vader J.P.
        • Fried M.
        • Beglinger C.
        • et al.
        Performance of panel-based criteria to evaluate the appropriateness of colonoscopy: a prospective study.
        Gastrointest Endosc. 1998; 48: 128-136
        • Herrin J.
        • Etchason J.A.
        • Kahan J.P.
        • Brook R.H.
        • Ballard D.J.
        Effect of panel composition on physician ratings of appropriateness of abdominal aortic aneurysm surgery: elucidating differences between multispecialty panel results and specialty society recommendations.
        Health Policy. 1997; 42: 67-81
        • Kahn K.L.
        • Park R.E.
        • Vennes J.
        • Brook R.H.
        Assigning appropriateness ratings for diagnostic upper gastrointestinal endoscopy using two different approaches.
        Med Care. 1992; 30: 1016-1028
        • Bridevaux I.P.
        • Silaghi A.M.
        • Vader J.P.
        • Froehlich F.
        • Gonvers J.J.
        • Burnand B.
        Appropriateness of colorectal cancer screening: appraisal of evidence by experts.
        Int J Qual Health Care. 2006; 18: 177-182
        • Nicollier-Fahrni A.
        • Vader J.P.
        • Froehlich F.
        • Gonvers J.J.
        • Burnand B.
        Development of appropriateness criteria for colonoscopy: comparison between a standardized expert panel and an evidence-based medicine approach.
        Int J Qual Health Care. 2003; 15: 15-22
        • Silverstein M.D.
        • Ballard D.J.
        Expert panel assessment of appropriateness of abdominal aortic aneurysm surgery: global judgement versus probability estimation.
        J Health Serv Res Policy. 1998; 3: 134-140
        • Bernstein S.J.
        • Hofer T.P.
        • Meijler A.P.
        • Rigter H.
        Setting standards for effectiveness: a comparison of expert panels and decision analysis.
        Int J Qual Health Care. 1997; 9: 255-263
        • Oddone E.Z.
        • Samsa G.
        • Matchar D.B.
        Global judgments versus decision-model-facilitated judgments: are experts internally consistent?.
        Med Decis Making. 1994; 14: 19-26
        • McClellan M.
        • Brook R.H.
        Appropriateness of care. A comparison of global and outcome methods to set standards.
        Med Care. 1992; 30: 565-586
        • Hemingway H.
        • Crook A.M.
        • Feder G.
        • Banerjee S.
        • Dawson J.R.
        • Magee P.
        • et al.
        Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization.
        N Engl J Med. 2001; 344: 645-654
        • Kravitz R.L.
        • Laouri M.
        • Kahan J.P.
        • Guzy P.
        • Sherman T.
        • Hilborne L.
        • et al.
        Validity of criteria used for detecting underuse of coronary revascularization.
        JAMA. 1995; 274: 632-638
        • Shekelle P.G.
        • Chassin M.R.
        • Park R.E.
        Assessing the predictive validity of the RAND/UCLA appropriateness method criteria for performing carotid endarterectomy.
        Int J Technol Assess Health Care. 1998; 14 (Fall): 707-727
        • Scott E.A.
        • Black N.
        Appropriateness of cholecystectomy: the public and private sectors compared.
        Ann R Coll Surg Engl. 1992; 74: 97-101
        • Leape L.L.
        • Park R.E.
        • Kahan J.P.
        • Brook R.H.
        Group judgments of appropriateness: the effect of panel composition.
        Qual Assur Health Care. 1992; 4: 151-159
        • Scott E.A.
        • Black N.
        Appropriateness of cholecystectomy in the United Kingdom—a consensus panel approach.
        Gut. 1991; 32: 1066-1070
        • Bernstein S.J.
        • Lazaro P.
        • Fitch K.
        • Aguilar M.D.
        • Rigter H.
        • Kahan J.P.
        Appropriateness of coronary revascularization for patients with chronic stable angina or following an acute myocardial infarction: multinational versus Dutch criteria.
        Int J Qual Health Care. 2002; 14: 103-109
        • Vader J.P.
        • Porchet F.
        • Larequi-Lauber T.
        • Dubois R.W.
        • Burnand B.
        Appropriateness of surgery for sciatica: reliability of guidelines from expert panels.
        Spine. 2000; 25: 1831-1836
        • Burnand B.
        • Vader J.P.
        • Froehlich F.
        • Dupriez K.
        • Larequi-Lauber T.
        • Pache I.
        • et al.
        Reliability of panel-based guidelines for colonoscopy: an international comparison.
        Gastrointest Endosc. 1998; 47: 162-166
        • Vader J.P.
        • Burnand B.
        • Froehlich F.
        • Dupriez K.
        • Larequi-Lauber T.
        • Pache I.
        • et al.
        Appropriateness of upper gastrointestinal endoscopy: comparison of American and Swiss criteria.
        Int J Qual Health Care. 1997; 9: 87-92
        • McGlynn E.A.
        • Naylor C.D.
        • Anderson G.M.
        • Leape L.L.
        • Park R.E.
        • Hilborne L.H.
        • et al.
        Comparison of the appropriateness of coronary angiography and coronary artery bypass graft surgery between Canada and New York State.
        JAMA. 1994; 272: 934-940
        • Fraser G.M.
        • Pilpel D.
        • Kosecoff J.
        • Brook R.H.
        Effect of panel composition on appropriateness ratings.
        Int J Qual Health Care. 1994; 6: 251-255
        • Bernstein S.J.
        • Kosecoff J.
        • Gray D.
        • Hampton J.R.
        • Brook R.H.
        The appropriateness of the use of cardiovascular procedures. British versus U.S. perspectives.
        Int J Technol Assess Health Care. 1993; 9 (Winter): 3-10
        • McDonnell J.
        • Stoevelaar H.J.
        • Bosch J.L.
        • Kahan J.P.
        The appropriateness of treatment of benign prostatic hyperplasia: a comparison of Dutch and multinational criteria.
        Health Policy. 2001; 57: 45-56
        • Washington D.L.
        • Bernstein S.J.
        • Kahan J.P.
        • Leape L.L.
        • Kamberg C.J.
        • Shekelle P.G.
        Reliability of clinical guideline development using mail-only versus in-person expert panels.
        Med Care. 2003; 41: 1374-1381
        • Tobacman J.K.
        • Scott I.U.
        • Cyphert S.T.
        • Zimmerman M.B.
        Comparison of appropriateness ratings for cataract surgery between convened and mail-only multidisciplinary panels.
        Med Decis Making. 2001; 21: 490-497
        • Tobacman J.K.
        • Scott I.U.
        • Cyphert S.
        • Zimmerman B.
        Reproducibility of measures of overuse of cataract surgery by three physician panels.
        Med Care. 1999; 37: 937-945
        • Lawson E.H.
        • Gibbons M.M.
        • Ingraham A.M.
        • Shekelle P.G.
        • Ko C.Y.
        Appropriateness criteria to assess variations in surgical procedure use in the United States.
        Arch Surg. 2011; 146: 1433-1440
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174
        • Fitch K.
        • Lazaro P.
        • Aguilar M.D.
        • Martin Y.
        • Bernstein S.J.
        Physician recommendations for coronary revascularization: variations by clinical speciality.
        Eur J Public Health. 1999; 9: 181-187
        • Kahan J.P.
        • Park R.E.
        • Leape L.L.
        • Bernstein S.J.
        • Hilborne L.H.
        • Parker L.
        • et al.
        Variations by specialty in physician ratings of the appropriateness and necessity of indications for procedures.
        Med Care. 1996; 34: 512-523
        • DeRouen T.A.
        • Murray J.A.
        • Owen W.
        Variability in the analysis of coronary arteriograms.
        Circulation. 1977; 55: 324-328
        • Ellis S.
        • Alderman E.L.
        • Cain K.
        • Wright A.
        • Bourassa M.
        • Fisher L.
        Morphology of left anterior descending coronary territory lesions as a predictor of anterior myocardial infarction: a CASS Registry Study.
        J Am Coll Cardiol. 1989; 13: 1481-1491
        • Rutkow I.M.
        • Gittelsohn A.M.
        • Zuidema G.D.
        Surgical decision making. The reliability of clinical judgment.
        Ann Surg. 1979; 190: 409-419
        • Brindis R.
        • Goldberg S.D.
        • Turco M.A.
        • Dean L.S.
        President's page: quality and appropriateness of care: the response to allegations and actions needed by the cardiovascular professional.
        J Am Coll Cardiol. 2011; 57: 111-113
      2. ACE Standards for Catherization Laboratory Accreditation. Accreditation for Cardiovascular Excellence 2011. Available at: http://www.cvexcel.org/CathPCI/Standards.aspx.

        • Junghans C.
        • Feder G.
        • Timmis A.D.
        • Eldridge S.
        • Sekhri N.
        • Black N.
        • et al.
        Effect of patient-specific ratings vs conventional guidelines on investigation decisions in angina: appropriateness of Referral and Investigation in Angina (ARIA) Trial.
        Arch Intern Med. 2007; 167: 195-202