Advertisement

Standards, guidelines, and norms

      Over the past two decades we have seen an increasing number of standards to occur: to support optimizing clinical practice; performing, reporting, and interpreting research; systematically evaluating and summarizing published data; and clinical guideline development. While in the beginning pioneers in standards development have often been confronted with reluctance among colleagues who sometimes perceived standards as cumbersome interference with clinical and scientific freedom, nowadays the challenge is to keep track of the plethora of guidelines of various quality that have appeared worldwide [
      • Vandenbroucke J.P.
      STREGA, STROBE, STARD, SQUIRE, MOOSE, PRISMA, GNOSIS, TREND, ORION, COREQ, QUOROM, REMARK… and CONSORT: for whom does the guideline toll?.
      ]. But without doubt, well developed, evidence-based, continuously evaluated and improved standards are of great help to clinicians, researchers, authors, guideline developers, health policy makers, and patients/consumers. Altogether, they represent a major step forward in improving quality and transparency of healthcare provision.
      In this issue, we make a start with a comprehensive series on the GRADE guidelines that will be published over the next months. GRADE stands for ‘Grades of Recommendation, Assessment, Development, and Evaluation,’ and provides guidance for rating quality of evidence and grading strength of recommendations in healthcare. It is the result of the collaborative efforts of a group of professionals, researchers, and guideline developers worldwide to develop an optimal system of rating quality of evidence and determining strengths of recommendations for clinical practice guidelines. This follows earlier publications for audiences of clinicians and policy makers [
      • Guyatt G.H.
      • Oxman A.D.
      • Vist G.E.
      • Kunz R.
      • Falck-Ytter Y.
      • Alonso-Coello P.
      • et al.
      GRADE: an emerging consensus on rating quality of evidence and strength or recommendations.
      ,
      • Guyatt G.H.
      • Oxman A.D.
      • Kunz R.
      • Vist G.E.
      • Falck-Ytter Y.
      • Schunemann H.J.
      What is “quality of evidence” and why is it important.
      ]; this new series will provide in-depth justification and explanation of GRADE and its implications for authors of systematic reviews and health technology assessments, guideline panelists, and methodologists active in this field. After an introduction of the series by Guyatt et al., Guyatt and colleagues provide basic information on the general approach of the GRADE evidence profile and summary-of-findings tables to facilitate clinical decisions. They then continue with more focused contributions on framing the relevant questions for systematic reviews and guidelines, deciding on important outcomes, and on rating the quality of evidence. The authors present their conceptual approach, followed by a general elaboration on how to deal with bias. In the issues to come, rating the quality of evidence in relation to specific types of bias and other topics will be dealt with, as listed in the introductory paper on the series. In an accompanying Commentary, Straus and Shepperd, who were guest editors for this series, put the important contributions from GRADE into the broader perspective of the methodological and research agenda for implementation science and knowledge translation.
      In connection to the question on how to deal with bias in producing clinical evidence, a review of systematic reviews by Parekh-Bhurke et al. provides interesting additional insight. They report that recent reviews show improvement in dealing with publication bias. However, few methods exist to deal with publication bias in non-quantitative findings of systematic reviews.
      A need for better standards is also emphasized in another context: the debate on how to deal with subgroup analyses in trials. In connection to their earlier contributions, the groups of Hasford and Bender continue their discussion, but agree that “beyond the standards for subgroup analyses in clinical trials, new standards for the interpretation of subgroup analyses in systematic reviews are needed” [
      • Bender R.
      • Koch A.
      • Skipka G.
      • Kaiser T.
      • Lange S.
      No inconsistent trial assessments by NICE and IQWiG: different assessment goals may lead to different assessment results regarding subgroup analyses.
      ]. We would be interested in receiving further methodological contributions on that need and how this should be realized. Are subgroup analyses allowed, and if so, in which situations? Or are they even necessary to tailor the evidence for translation to specific patient groups? How is the relationship between hypothesis testing and explorative analysis, and how should subgroup analyses in trials and systematic reviews optimally connected?
      In addition to standards for clinical decision making to improve health outcome for individual patients, there are norms that are used to assess the impact of diseases in the population. But current practice can be improved: in a systematic review, Norris et al. assessed health-related quality of life (HQRL) among persons with type 2 diabetes by analyzing the results of a large number of studies and comparing them with published norms for the US population. It was shown that the impact of type 2 diabetes on HRQL may be underestimated when published norms are applied.
      Avoiding bias in studying populations is also covered by other authors in various ways. Schmidt et al., using data of the Study of Health in Pomerania, focus on pitfalls in population sampling when incompletely accounting for strata, clustering, and weights in the study of lifestyle indicators. Valid sampling from administrative databases is evaluated by two groups. First, Smeets and coworkers show, illustrated by the AGIS database, that health insurance databases (HIDs) offer a large potential for several types of clinical research, but lack of detailed information can be an important limitation. Another, potentially important data source for health-related information is the electronic medical record of primary care practices. After promising early results in using practice computer data for quality assurance in type 2 diabetes patients [
      • Höppener P.
      • Knottnerus J.A.
      • Grol R.
      • Metsemakers J.F.
      Computerization of general practices and quality control. Blood glucose regulation in type 2 diabetics investigated in the Registration Network family practices.
      ], Tu and colleagues demonstrate that, for the purpose of evaluating and improving quality of care, structured data from electronic medical records from primary care can be used to identify diabetes patients with both a high positive and negative predictive value.
      The appropriate assessment of health-related phenomena is also addressed by two other groups. In a review, Jordan et al. report that health literacy –which is increasingly recognized as a major hurdle in improving the health of both western and non-western populations – is not consistently measured. This makes it difficult to interpret and compare health literacy at individual and population levels. More evidence on validity of measurements but also better instruments are needed. Klijs and his team, using data from the GLOBE study, test the key assumption of the dynamic equilibrium theory that severe disability is associated with proximity to death, whereas mild disability is not. Their findings support that theory. An implication is that projections of the future burden of disability could be substantially improved by connecting to this theory and incorporating information on proximity to death.
      A key issue in quantitive analysis in clinical epidemiological research is addressed by Wisloff et al.: the number needed to treat (NNT). From a Monte Carlo simulation study they conclude that clinicians should use NNT cautiously when expressing treatment benefits because, for realistic values of true absolute risk reduction, NNT varies much more than absolute risk reduction and relative risk. We would appreciate further methodological contributions on this topic: to what extent is this measure contributing to better clinical practice [
      • Berlin J.A.
      N-of-1 clinical trials should be incorporated into clinical practice.
      ], and how would it be appropriately connected to other ones? Another even more classic issue is the ‘value of the P-value’: in their correspondence, Stang and Müeller remind us that P values should be supplemented by absolute frequencies of observations. The rationale, use, and implications of significance testing are also part of the discussion between Fagerland and Neuhäuser on the statistical methodology of comparing skewed distributions with unequal variances, in testing either for difference or equivalence. Stang suggests we continue the debate on the ‘ongoing tyranny of statistical significance testing in biomedical research’ [
      • Stang A.
      • Poole C.
      • Kuss O.
      The ongoing tyranny of statistical significance testing in biomedical research.
      ]. We are interested in hearing more views on this.

      References

        • Vandenbroucke J.P.
        STREGA, STROBE, STARD, SQUIRE, MOOSE, PRISMA, GNOSIS, TREND, ORION, COREQ, QUOROM, REMARK… and CONSORT: for whom does the guideline toll?.
        J Clin Epidemiol. 2009; 62: 594-596
        • Guyatt G.H.
        • Oxman A.D.
        • Vist G.E.
        • Kunz R.
        • Falck-Ytter Y.
        • Alonso-Coello P.
        • et al.
        GRADE: an emerging consensus on rating quality of evidence and strength or recommendations.
        BMJ. 2008; 336: 924-926
        • Guyatt G.H.
        • Oxman A.D.
        • Kunz R.
        • Vist G.E.
        • Falck-Ytter Y.
        • Schunemann H.J.
        What is “quality of evidence” and why is it important.
        BMJ. 2008; 336: 995-998
        • Bender R.
        • Koch A.
        • Skipka G.
        • Kaiser T.
        • Lange S.
        No inconsistent trial assessments by NICE and IQWiG: different assessment goals may lead to different assessment results regarding subgroup analyses.
        J Clin Epidemiol. 2010; 63: 1305-1307
        • Höppener P.
        • Knottnerus J.A.
        • Grol R.
        • Metsemakers J.F.
        Computerization of general practices and quality control. Blood glucose regulation in type 2 diabetics investigated in the Registration Network family practices.
        Fam Pract. 1992; 9: 353-356
        • Berlin J.A.
        N-of-1 clinical trials should be incorporated into clinical practice.
        J Clin Epidemiol. 2010; 63: 1283-1284
        • Stang A.
        • Poole C.
        • Kuss O.
        The ongoing tyranny of statistical significance testing in biomedical research.
        Eur J Epidemiol. 2010; 25: 225-230