A hierarchy of evidence for assessing qualitative health research
Article Outline
- Abstract
- 1. Introduction
- 2. The qualitative research task
- 3. A qualitative hierarchy of evidence-for-practice
- 4. Discussion
- 5. Conclusion
- References
- Copyright
Abstract
Objective
The objective of this study is to outline explicit criteria for assessing the contribution of qualitative empirical studies in health and medicine, leading to a hierarchy of evidence specific to qualitative methods.
Study Design and Setting
This paper arose from a series of critical appraisal exercises based on recent qualitative research studies in the health literature. We focused on the central methodological procedures of qualitative method (defining a research framework, sampling and data collection, data analysis, and drawing research conclusions) to devise a hierarchy of qualitative research designs, reflecting the reliability of study conclusions for decisions made in health practice and policy.
Results
We describe four levels of a qualitative hierarchy of evidence-for-practice. The least likely studies to produce good evidence-for-practice are single case studies, followed by descriptive studies that may provide helpful lists of quotations but do not offer detailed analysis. More weight is given to conceptual studies that analyze all data according to conceptual themes but may be limited by a lack of diversity in the sample. Generalizable studies using conceptual frameworks to derive an appropriately diversified sample with analysis accounting for all data are considered to provide the best evidence-for-practice. Explicit criteria and illustrative examples are described for each level.
Conclusion
A hierarchy of evidence-for-practice specific to qualitative methods provides a useful guide for the critical appraisal of papers using these methods and for defining the strength of evidence as a basis for decision making and policy generation.
Keywords: Qualitative research, Hierarchy of evidence, Quality indicators, Empirical studies, Qualitative evidence for clinical practice, Evidence-based medicine
1. Introduction
In the medical and health literature, there has been a steady rise in the number of papers reporting studies that use qualitative research method. Although some of these papers aim to develop theory or to advance method, a substantial proportion of papers report on issues directly relevant to clinical practice or health policy. What may not be clear to decision makers is how useful qualitative research is for generating the evidence that underlies evidence-based medicine.
The Cochrane Collaboration has struggled to find ways of incorporating evidence from qualitative research into systematic reviews. Articles reporting qualitative systematic reviews are now making their appearance [1], [2]. The challenge that these “courageous” researchers face, Jennie Popay tells us, is that the most difficult problem remains unresolved, that of defining clear criteria for selecting high-quality qualitative studies for inclusion in reviews [3]. Criteria for judging the quality of quantitative studies [4] are well known as is the hierarchy of designs based on the strength of the evidence for treatment decisions [5], but what might be the criteria for studies that are based on narrative argument rather than on measurement and control over variables?
The Evidence-based Medicine Working Group has generated a description of an ideal qualitative study [5] and this echoes the “appraisal tool” produced by the Critical Skills Appraisal Programme in the Public Health Resource Unit of the National Health Service [6], a substantial document from the United Kingdom's Government Chief Social Researcher's Office, [7] and the more succinct but thoughtful account of the Medical Sociology Group of the British Sociological Association [8].
A major problem is that these guidelines tend to see qualitative research methods as one method despite a diversity that includes anything from discourse analysis to ethnography, with data collected in personal interviews or focus groups (with samples of varying sizes), by participant observation or through documentary analysis. Some guidelines include discussion of the ethics of research with human participants, ways of reporting back to research participants, and ways of including communities in the actual conduct of a study. Commonly there is discussion of the need for “critical reflexivity” concerning the participative role of the researcher in the actual conduct of the research, and consideration may be given to the importance of poetics or esthetic narratives in reporting a study [9].
Not surprisingly, advice is inconsistent between guidelines. It is the flexibility of qualitative method, its capacity for adaptation to a variety of research settings, that is seen as one of its strengths [10] but it is this same flexibility that generates a range of study designs not easily captured in a single set of quality criteria. It is also true that quality criteria are viewed with concern by qualitative researchers. There is doubt about whether a “checklist” approach can capture the nuances and intricacies of the approach and concern that an emphasis on evidence will undermine the insight that can flow from a qualitative study [9]. A useful typology of qualitative studies has been reported for qualitative “metasynthesis,” [11] and “meta-ethnography” is seen as promising [12] but these methods require further development and, again, are undermined by failure to define clear criteria for judging the quality of included studies.
We are not in any doubt that experienced qualitative researchers can tell a good study from a poor one without the need for guidelines but readers of health journals, and even some health researchers, approach qualitative research papers with varying degrees of confidence, skill, and experience. A novice to the method, faced with reviewing a paper using qualitative research methods, might feel that the methodological intricacies of the method are overwhelming and, instead, focus on that which readily distinguishes qualitative research—quotations from interviews.
Words are seductive. The emotive quality of quotations from qualitative studies can draw a sympathetic response from reviewers and readers—and these quotations do enliven an otherwise dull research report. But quotations are not self-validating and require analysis. If qualitative research is to be used as the basis for health care practice and health policy decisions, we need to identify studies that give moving insights into the lives of participants but then, in addition, report sound methods and defensible conclusions that apply to more than the immediate group of research participants.
This paper started as an exercise in critical appraisal of recent qualitative studies in the health literature. We found little difficulty in identifying a large number of sound qualitative studies but then faced the difficulty that some of these studies clearly provide better evidence-for-practice than others. What we report here is a ranking of qualitative study designs in order of the quality of evidence that a well-conducted study offers when practical decisions have to be made. As in quantitative research, study designs range from limited but insight-provoking single case studies to more complex studies that control for bias.
Glasziou et al. [13] acknowledge that the quantitative evidence hierarchy for assessing the effect of interventions suffers from the problem of classifying many different aspects of research quality under a single grade, but we agree with these authors that there is a need “to broaden the scope by which evidence is assessed.” Explicit criteria for assessing qualitative research would assist in transparency in peer review of research papers and a qualitative hierarchy of evidence would help practitioners identify that research which provides the strongest basis for action.
Here, a word of caution is appropriate. The evidence that is likely to emerge from a qualitative study will look quite different from evidence that is generated by a randomized controlled trial. Although qualitative studies may illuminate treatment issues, for example, indicating why some patients respond in a particular way to treatment, it is also common for a qualitative study to generate critique of current practice, indicating where standard practice may not be beneficial to one or more groups of people. We also include under the term, evidence for or against public health or prevention programs and evidence relevant to the formulation of better health policy.
2. The qualitative research task
We take for granted that there are standard processes for conducting any research study: matching method to research problem, ethical considerations, and reporting requirements. Our focus in this paper is on the central part of the research process, the conduct and reporting of a qualitative research study. We confine ourselves to interview studies as the most commonly used in health research. First, we describe standard qualitative research procedures and then show how these contribute to a hierarchy for judging the strength of qualitative evidence for changes in practice.
2.1. Defining a research framework
Qualitative research method derives from a theory-based discipline, sociology. Sociologists consider both a traditional literature review and the theoretical literature. Social theory provides a systematic means of understanding human actions and institutions, including the assumptions made and the concepts used. Together these two literatures provide the questions for analysis and the appropriate concepts or themes that will be used to frame the study so that the findings relate directly to the research question.
2.2. Sampling and data collection
On the basis of the research framework, qualitative researchers focus in on the setting or group that is likely to provide the strongest, most relevant information about the research problem. If there is substantial uncertainty about an issue, it may be appropriate to select a random sample but, more commonly, it is possible to narrow the focus of initial data collection. A considerable time spent in the field with key informants can help to identify the most productive research sample. Thus, sampling is not a matter of numbers or of convenience but is strategically focused to collect the most appropriate, “rich” data, and researchers can be participants in a setting rather than purely objective observers.
Three simultaneous procedures then occur. Researchers remain in the field to ensure that the data collected fit with what they know of the setting, perhaps asking key informants for comment. Data collection commences using the most appropriate method for the setting in which the research is conducted. As soon as data are collected, data analysis commences. The sample may be diversified to address issues arising from early analysis. Data collection continues until the diversity of experience is identified and understood.
2.3. Data analysis
Analysis is essentially a taxonomic process. Data are sorted to give coherent categories of experience, drawing on the initial theoretical framework but also on theoretical concepts that emerge from the data. In some cases, the full sample may have the same experience but more commonly data are sorted into different conceptual categories each providing an explanation for what is observed. Analytic categories are “saturated” when there is sufficient information for the experience to be seen as coherent and explicable, for example in showing that a group of research participants act in the same way because of shared values or life experiences. If there is an odd-one-out, unless the difference can be explained, it may be necessary to return to the field to establish if the odd-one-out belongs to a group previously missed, about which data now have to be collected. Eventually, through a process of diversifying and intensifying the data collected and analyzed, researchers should be able to make sense of the experience of all people in all categories in the study, or explain the conditions under which exceptions occur.
2.4. Drawing research conclusions
The task here is to give an account of the analytical categories, defining what each group or category has in common and what it is that makes them different from the others. The requirement here is not usually to report only on extremes of experience but also to give an account that covers the experience of all in a group.
Authors need to specify the practical, methodological, and theoretical limitations of any study. It should be noted here that qualitative research studies report on samples that are small in comparison with most quantitative studies but the adequacy of the sample is best judged in the overall context of the study. The question is how well the researchers can persuade us as readers that the phenomenon under investigation is well understood.
Qualitative studies also often use another step in the analysis, which is to assess the coherence of the results with the earlier literature review and with social theory. If the conclusions reached are new and consistent with the literature and with the theory, then the knowledge base has been strengthened. Where there are inconsistencies, the task is to explain why the results are inconsistent with what we knew before, and how the original knowledge base has to be extended to accommodate the new findings. These last steps should indicate the extent to which the study findings are likely to be transferable to other settings. Where there are uncertainties in a study, the conclusions should be appropriately cautious and indicate what further research is needed.
3. A qualitative hierarchy of evidence-for-practice
The hierarchy we are proposing is summarized in Fig. 1 and Table 1. The emphasis in this hierarchy is on the capacity of reported research to provide evidence-for-practice or policy. In common with the quantitative hierarchies of method, research using methods lower in the hierarchy can be well worth publishing because it contributes to our understanding of a problem. Sometimes, especially in new or difficult research contexts, studies are constrained and the conclusions tend to be hypothesis generating with limitations in the extent to which their results may be generalized to other contexts. As decision makers, however, our trust is placed in study designs producing transferable evidence, evidence that is generalizable beyond the setting where the study is conducted.
Table 1. A hierarchy of evidence-for-practice in qualitative research—summary features
| Study type | Features | Limitations | Evidence for practice |
|---|---|---|---|
| Generalizable studies (level I) | Sampling focused by theory and the literature, extended as a result of analysis to capture diversity of experience. Analytic procedures comprehensive and clear. Located in the literature to assess relevance to other settings. | Main limitations are in reporting when the word length of articles does not allow a comprehensive account of complex procedures. | Clear indications for practice or policy may offer support for current practice, or critique with indicated directions for change. |
| Conceptual studies (level II) | Theoretical concepts guide sample selection, based on analysis of literature. May be limited to one group about which little is known or a number of important subgroups. Conceptual analysis recognizes diversity in participants' views. | Theoretical concepts and minority or divergent views that emerge during analysis do not lead to further sampling. Categories for analysis may not be saturated. | Weaker designs identify the need for further research on other groups, or urge caution in practice. Well-developed studies can provide good evidence if residual uncertainties are clearly identified. |
| Descriptive studies (level III) | Sample selected to illustrate practical rather than theoretical issues. Record a range of illustrative quotes including themes from the accounts of “many,” “most,” or “some” study participants. | Do not report full range of responses. Sample not diversified to analyze how or why differences occur. | Demonstrate that a phenomenon exists in a defined group. Identify practice issues for further consideration. |
| Single case study (level IV) | Provides rich data on the views or experiences of one person. Can provide insights in unexplored contexts. | Does not analyze applicability to other contexts. | Alerts practitioners to the existence of an unusual phenomenon. |
Unlike the hierarchy used in evidence-based medicine, the different levels of the hierarchy do not have specific, recognized names such as “cohort study” or “randomized controlled trial.” We have had to invent names and we see these as tentative. For explanatory purposes, we start with the study design least likely to produce good evidence-for-practice.
3.1. Single case study
Interviews with one or a very small number of people can provide important insights into hitherto unexplored contexts. These individual accounts sometimes consist of moving, emotional accounts of personal experience that rouse sympathy, giving a rare insight into an uncommon experience. Rice, Ly, and Lumley [14] described the experience of a Hmong woman who experienced “soul loss” when giving birth by Cesarean section in a Western tertiary hospital. The study alerted practitioners to the importance of cultural difference but it is unlikely that hospital administrators would want to institute the corrective that the study describes—a shaman sacrificing a chicken in the operating room where the surgery took place—nor did the authors suggest this.
Although a well-conducted single case study provides poor evidence-for-practice, it can generate hypotheses for later studies. In this case, a subsequent study demonstrated that cultural beliefs are often less important than women's own experiences of poor quality care [15].
3.2. Descriptive studies
Descriptive studies, often atheoretical, focus on a sample selected from a specific group or setting with no further diversification. These studies are commonly conducted simply to describe participant views or experiences and to provide additional information in support of a concurrent trial or survey. Reporting focuses on listing illustrative quotes rather than on using data to explain why people hold the views expressed. Rousseau et al. [16] studied the introduction in general practice of clinical decision support systems for the management of two chronic diseases. In conjunction with a randomized controlled trial, interviews were conducted in five settings in northeast England with two practice managers, three nurses, and eight general practitioners. Boxed sets of quotes demonstrate both enthusiasm for and criticism of the system. The authors state that the dominant theme in interviews concerned specific difficulties with integrating the system into clinical practice. The results help to explain the failure of the trial of the computerized system. The authors acknowledge that the sample was self-selected and did not include practitioners who chose not to engage with the new technology. Their study has modest aims, the conclusion is appropriately limited and calls for further development of clinical computer systems rather than a change in practice.
3.3. Conceptual studies
Conceptual studies proceed from a conceptual framework that guides sample selection. The sample frame includes a range of conceptual categories identified as significant in earlier research (either qualitative or quantitative), and study participants are selected to provide an understanding of each category. Importantly, the sample is not diversified in the field and the analysis rests on the comprehensiveness of the original conceptual framework. The focus is on developing an overall account of the views of participants, or groups of participants, as selected, and then drawing appropriate conclusions.
Varas-Diaz et al. [17] studied a single group, 30 Puerto Ricans living with acquired immune deficiency syndrome. The study is structured around the theoretical concept of stigma with participants selected to represent different infection routes, an important source of stigma. The authors concentrate on providing an overview of the way in which stigma operates on these people's lives. A selection of quotations demonstrates the devastating effect of stigma and the role of social interaction in stigmatization, an issue identified as important in the theoretical literature. They do not give an account of how the quotations were selected or how consistent experiences were across the group except when we are told that the study shows the overall importance of stigmatization by family members. The study is thought provoking, articulates well with the theoretical and general literature on stigma, but we do not know what differences, if any, there are in the group, or how these might be explained. The authors call for more research on Latino families, and their major conclusion for practice is that interventions should address stigmatizing family dynamics.
A much more developed study is that of Williams who analyzed the way in which teenagers deal with chronic disease [18]. The sample was selected to give diversity, covering the major issues seen as important in the literature. Asthma and diabetes were selected as they pose different problems for compliance. The focus was on the theoretical concept of gender and 10 boys and 10 girls with each illness, and their parents, were interviewed. Participants were drawn from hospital and general practice, and from suggestions made by those interviewed. Across the sample, girls were better than boys in following treatment prescriptions, but boys were better at diet and exercise requirements. These differences are located in an analysis of the impact of masculinity and femininity on compliance regimes. Boys, for example, respond differently to asthma and diabetes but do so in a way that conforms to an important aspect of the theory of stigma, the need to “pass” as normal in social interactions (a concept that might have enriched the analysis of Varas-Diaz, above). The sample was not diversified to account for the small number of boys who cope poorly but were reluctant to be interviewed; interviews with their parents filled in some missing details. Residual uncertainties were made explicit and the conclusions give explicit guidance to the way in which sex is likely to compromise treatment regimes.
3.4. Generalizable studies
At the apex of the hierarchy are the ideal, well-developed qualitative studies. These studies often build on earlier studies, commencing with a comprehensive literature review, which provides the conceptual framework for initial data collection. The sample is extended when early analysis indicates that additional conceptual categories are required. There is a clear report of data collection and analysis for the full sample including issues of diversity and data saturation, persuasively explaining the differences between groups. The generalizability of the findings is defined with reference to the relevant literature to show how the study applies to other settings or groups. Such studies provide a secure basis for practice or policy.
Given the demanding nature of these ideal studies, we would expect that there would be few reported. Authors conducting studies that are both intensive and extensive face a significant challenge in reporting all aspects of the study in one article. With certain limitations in each case, we present two studies as possible candidates.
Davis et al. [19] addressed perceptions of risk of people who inject drugs with respect to hepatitis C. An extensive review of the relevant literature and theory guided sample selection from a large variety of settings. The sample was further diversified during analysis to increase the number of young users. Although they report the views of “some” or “others” in the analysis of the 59 participant interviews, this is done to demonstrate the variety of response. The analysis itself focuses on thematic narratives (not defined), which we take to be the cultural stories that people tell to account for their activities, in this case, how they see and manage risk. There is a moving account of the cooperative way in which new users acquired the skills to become autonomous, self-injecting, at the same time negotiating issues of risk. The views of older participants are contrasted with younger participants with respect to the emergence of the human immunodeficiency virus epidemic, but there is no account of how consistent these views are in the sample. The authors demonstrate troublesome “gaps and confusion” in the users' knowledge of hepatitis C, including its “unavoidability.” The conclusions point to the need for changed public health intervention strategies: the risk reduction strategies used by this group of people who inject drugs are widespread and the task is to build on and extend these preventive skills.
The next example reports on a hidden population and the findings are relevant to both public health policy and clinicians in sexual health practice. Studies of difficult-to-access groups often use an ethnographic approach where all the researchers are “immersed” in the field for a substantial period of time. Sample procedures are not formally imposed but gradually emerge in the process of gathering intimate details about life in this setting. Sanders [20] reports a study on female workers in indoor sex markets where workers are older in age and more experienced than workers on the street. The study is framed by concepts of risk in gendered power relationships in a setting where women appear to choose to expose themselves to risk. She spent 1000 hours in observing social processes and conducted 55 interviews with a variety of workers. There is no explicit account of saturation or diversification but a thorough and persuasive analysis of various views expressed in the interviews articulates well with the literature. The author concludes that women in these settings have important strategies for reducing health-related risks and the risk of violence, depending on their experience and both personal and work settings. What emerges as more important is the emotional risk of being identified as a sex worker; again, their responses vary according to their circumstances with some women suspending all other sexual relationships. The author concludes that intervention programs should not focus only on physical and health-related issues but have as an important aim the decriminalization of sex work.
4. Discussion
There are risks in reducing a complex set of professional research procedures to a simple code and, in common with our colleagues in evidence-based medicine, we recognize that hierarchies of evidence-for-practice can be used and abused [9]. Our focus here is not on papers that have the primary aim of developing theory or method. We have located four distinct qualitative research designs for interview studies in a hierarchy that reflects the validity of conclusions for clinical practice and health policy.
Three issues make it necessary to set methodological standards for a good qualitative study, defining relevance for practice in terms of methodological quality criteria. Various health disciplines have taken to qualitative research with enthusiasm that is not matched with skill. The result is a plethora of low-level studies, of dubious quality. A lack of understanding of social theory or of the literature means that theories and concepts are not fully used to frame the research process. Practitioners unfamiliar with the intricacies of qualitative research method lack a framework for judging which studies provide a secure basis for practice decisions. It is these readers who will benefit from having a succinct definition of how to judge the contribution to evidence from qualitative research papers.
It is clear that many qualitative studies published in the health literature fall far short of the ideal research processes of generalizable studies. This may be due to poor research practice, or to poor reporting of well-designed studies (a problem not limited to qualitative methods). In this case, good peer review should identify what is needed for improved reporting or for the better conduct of a qualitative study. It is also true that there are unavoidable impediments to the ideal research process. Researchers may have insufficient research funding for data saturation. A particular group that they want to interview may not be accessible or there may only be a limited number of people that have had the experience in question. The task for the researchers is to justify the more modest aims of the study, making evident the unavoidable limitations, and it is helpful when researchers indicate future directions for this kind of research.
5. Conclusion
In defining the essential features of each stage of the central methodological task of a qualitative research study, we are setting in place a model of the ideal research project for developing qualitative evidence-for-practice. If the ideal generalizable study is realized, we should have a research study that provides evidence that is secure, evidence that a reader can trust, and evidence that a policy maker or practitioner can use with confidence as the basis for decision making and policy generation. These are the ideal studies for incorporation into systematic reviews. Not all research can reach this standard. We have defined a hierarchy of evidence that is specific to qualitative research based on the methodological characteristics of four research designs.
When we read, or review, qualitative research papers, it is inevitable that we will apply criteria for what is a good study and what is not worthwhile. Often the criteria that we use are implicit in our decision, rarely articulated. What we have presented is a set of explicit quality criteria based on the central methodological task of the qualitative researcher: defining a theoretical framework for the study, specifying a sampling process, describing the methods of data collection and analysis, and drawing research conclusions. Understanding and evaluating claims to knowledge made by qualitative research is important in meeting the policy and practice needs of an increasing complex health environment. We recognize that there are pitfalls in setting out what is a prescriptive approach to evaluating qualitative studies with relevance for practice, but we are persuaded by the larger pitfalls involved in having no such criteria.
References
- . Systematic review of qualitative studies exploring parental beliefs and attitudes toward childhood vaccination identifies common barriers to vaccination. J Clin Epidemiol. 2005;58(11):1081–1088
- . Systematically reviewing qualitative studies complements survey design: an exploratory study of barriers to paediatric immunisations. J Clin Epidemiol. 2005;58:1101–1108
- . Moving beyond floccinaucinihilipilification: enhancing the utility of systematic reviews. J Clin Epidemiol. 2005;58(11):1079–1080
- . Checklists for reviewing articles. In: Chalmers I, Altman DG editor. Systematic reviews. London, UK: BMJ Publishing Group; 1995;p. 75–85
- In: Guyatt GH, Rennie D editor. Users' guides to the medical literature: a manual for evidence-based clinical practice. Chicago, IL: AMA Press; 2001;
- Critical Appraisal Skills Programme. Milton Keynes Primary Care Trust; 2002.
- Spencer L, Ritchie J, Lewis J, Dillon L. Quality in Qualitative Evaluation: a framework for assessing research evidence. United Kingdom Government Chief Social Researcher's Office; 2003.
- . Criteria for the evaluation of qualitative research papers. Accessed June 1, 2005. Available at: http://www.tandf.co.uk/journals/pdf/qdr.pdfMed Sociol News. 1996;22:
- . Qualitative methods for health research. London: Sage Publications; 2004;
- . Innovation and compromise: responsibility and reflexivity in research with vulnerable groups. In: Daly J, Guillemin M, Hill S editor. Technologies and health: critical compromises. Melbourne: Oxford University Press; 2001;p. 136–150
- . Classifying the findings of qualitative studies. Qual Health Res. 2003;13:905–923
- Evaluating meta-ethnography: a synthesis of qualitative research on lay experiences of diabetes and diabetes care. Soc Sci Med. 2003;56:671–684
- . Assessing the quality of research. BMJ. 2004;328:39–41
- . Childbirth and soul loss: the case of a Hmong woman. Med J Aust. 1994;160:577–578
- . Mothers in a new country: the role of culture and communication in Vietnamese, Turkish and Filipino women's experiences of giving birth in Australia. Women Health. 1999;28:77–101
- . Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326:314–318
- . AIDS-related stigma and social interaction: Puerto Ricans living with HIV/AIDS. Qual Health Res. 2005;15:169–187
- . Doing health, doing gender: teenagers, diabetes and asthma. Soc Sci Med. 2000;50:387–396
- . Preventing hepatitis C: ‘Common sense’, ‘the bug’ and other perspectives from the risk narratives of people who inject drugs. Soc Sci Med. 2004;59:1807–1818
- . A continuum of risk? The management of health, physical and emotional risks by female sex workers. Sociol Health Illn. 2004;26:557–574
PII: S0895-4356(06)00210-1
doi:10.1016/j.jclinepi.2006.03.014
© 2006 Elsevier Inc. All rights reserved.

