Abstract
Objective
The mixed methods appraisal tool (MMAT) was developed for critically appraising different study designs. This study aimed to improve the content validity of three of the five categories of studies in the MMAT by identifying relevant methodological criteria for appraising the quality of qualitative, survey, and mixed methods studies.
Study Design and Setting
First, we performed a literature review to identify critical appraisal tools and extract methodological criteria. Second, we conducted a two-round modified e-Delphi technique. We asked three method-specific panels of experts to rate the relevance of each criterion on a five-point Likert scale.
Results
A total of 383 criteria were extracted from 18 critical appraisal tools and a literature review on the quality of mixed methods studies, and 60 were retained. In the first and second rounds of the e-Delphi, 73 and 56 experts participated, respectively. Consensus was reached for six qualitative criteria, eight survey criteria, and seven mixed methods criteria. These results led to modifications of eight of the 11 MMAT (version 2011) criteria. Specifically, we reformulated two criteria, replaced four, and removed two. Moreover, we added six new criteria.
Conclusion
Results of this study led to improve the content validity of this tool, revise it, and propose a new version (MMAT version 2018).
1. Introduction
Systematic reviews are considered among the best available sources of research evidence and are increasingly relied on to inform decision-making [
[1]- Bunn F.
- Trivedi D.
- Alderson P.
- Hamilton L.
- Martin A.
- Pinkney E.
- et al.
The impact of Cochrane reviews: a mixed-methods evaluation of outputs from Cochrane review groups supported by the National Institute for health research.
]. The past 40 years have seen increasingly rapid methodological advances in the field of systematic reviews and research synthesis. Initial developments mainly focused on meta-analysis for addressing questions on the effectiveness of interventions, and the emphasis was on randomized controlled trials [
2Primary, secondary, and meta-analysis of research.
,
3Effectiveness and efficiency: Random reflections on health services.
]. Since the early 2000s, researchers have shown a growing interest in systematic mixed studies reviews, which combine quantitative, qualitative, and mixed methods studies to address other types of review questions concerned with, for instance, the acceptability of an intervention, participants’ satisfaction, or barriers to implementation (see
Supplementary File 1). Systematic mixed studies reviews are particularly useful for providing in-depth answers to complex clinical problems and practical concerns. Several challenges, however, are encountered in these reviews because of the heterogeneity of included study designs. One of these challenges pertains to the critical appraisal of included studies.
Critical appraisal consists in a systematic and careful examination of studies to ensure they are trustworthy, valid, and reliable [
4Quality and relevance appraisal.
,
5What is critical appraisal?.
]. It is an essential step in systematic reviews to ensure that their recommendations and conclusions reflect the quality of the evidence reviewed [
[6]Assessing risk of bias in included studies.
]. Since reviewers’ judgment of a same study can vary greatly, critical appraisal tools have been developed to help reviewers appraise study quality in a more consistent, transparent, and reproducible way [
7Study quality assessment in systematic reviews of research on intervention effects.
,
8Judging research quality.
,
9How to appraise the studies: an introduction to assessing study quality.
]. A critical appraisal tool (also named quality assessment tool or risk of bias tool) is a scale or checklist in which a list of criteria/domains is suggested to appraise the quality of a study. Extant reviews of critical appraisal tools have identified over 500 tools (see
Supplementary File 1). Most of these tools are specific to a particular research design or method. It is, thus, complex and time consuming to conduct systematic mixed studies reviews as reviewers must search for and learn how to use several different tools to complete the critical appraisal of the qualitative, quantitative, and/or mixed methods studies included in each review.
To address the challenge of critical appraisal in systematic mixed studies reviews, a unique tool for assessing the quality of different study designs was developed: the Mixed Methods Appraisal Tool (MMAT) [
[10]- Pluye P.
- Robert E.
- Cargo M.
- Bartlett G.
- O’Cathain A.
- Griffiths F.
- et al.
Proposal: A Mixed Methods Appraisal Tool for systematic mixed studies reviews.
]. The MMAT was first published in 2009 and has five sets of criteria for: (a) qualitative (such as case study and grounded theory), (b) randomized controlled trials, (c) nonrandomized (such as cohort studies and case-control studies), (d) quantitative descriptive (such as surveys and case series), and (e) mixed methods studies. When appraising mixed methods studies, three sets of criteria are assessed in no particular order: (a) the qualitative set, (b) a quantitative set (either randomized controlled, nonrandomized or quantitative descriptive studies), and (c) the mixed methods set. In doing so, the MMAT acknowledges the methodological distinctive characteristics specific to each component used in mixed methods studies (i.e., qualitative, quantitative, and mixed methods) [
[11]Assessing the quality of mixed methods research: towards a comprehensive framework.
].
Previous studies on the interrater reliability of the MMAT reported that agreement scores ranged from poor to perfect [
12- Pace R.
- Pluye P.
- Bartlett G.
- Macaulay A.C.
- Salsberg J.
- Jagosh J.
- et al.
Testing the reliability and efficiency of the pilot Mixed Methods Appraisal Tool (MMAT) for systematic mixed studies review.
,
13- Souto R.Q.
- Khanassov V.
- Hong Q.N.
- Bush P.L.
- Vedel I.
- Pluye P.
Systematic mixed studies reviews: updating results on the reliability and efficiency of the Mixed Methods Appraisal Tool.
]. This suggests the need for clarification of some criteria in the MMAT, particularly those related to qualitative and nonrandomized studies, for which lower agreement was observed. In addition, in interviews conducted with MMAT users to explore their views and experiences of the MMAT, concerns were raised about whether the tool included enough criteria to judge the quality of studies and criteria that were difficult to judge, in particular the criteria for qualitative and mixed methods studies [
[14]- Hong Q.N.
- Gonzalez-Reyes A.
- Pluye P.
Improving the usefulness of a tool for appraising the quality of qualitative, quantitative and mixed methods studies, the Mixed Methods Appraisal Tool (MMAT).
]. This suggests a need to improve the content validity of the MMAT. The content validity of an assessment tool is defined as the degree to which criteria are relevant to and representative of their targeted construct [
[15]- Haynes S.N.
- Richard D.
- Kubany E.S.
Content validity in psychological assessment: a functional approach to concepts and methods.
]. A conceptual framework on the quality appraisal in systematic mixed studies reviews was developed in which three dimensions of quality were presented: reporting, conceptual, and methodological [
[16]A conceptual framework for critical appraisal in systematic mixed studies reviews.
]. Reporting quality relates to the transparency, accuracy, and completeness of the information provided in a paper. Conceptual quality concerns the insight that can be gained about the phenomenon of interest. The methodological quality concerns the validity or trustworthiness of a study and is related to the methodology and methods used and how biases were minimized. In the MMAT, the targeted construct is the methodological quality of studies appraised in systematic mixed studies reviews.
Currently, the existing literature on critical appraisal has focused, for the most part, on randomized controlled trials, cohort studies, and/or case-control studies, and several validated tools can be found for these study designs. This literature will inform the criteria on randomized controlled trials and nonrandomized studies to revise in the MMAT. However, for other designs, such as qualitative, survey, and mixed methods, critical appraisal is more challenging because validated tools are rare and there is no clear consensus on how their quality assessment should be performed [
17- Heyvaert M.
- Hannes K.
- Maes B.
- Onghena P.
Critical appraisal of mixed methods studies.
,
18Appraising the quality of qualitative research.
,
19- Santiago-Delefosse M.
- Gavin A.
- Bruchez C.
- Roux P.
- Stephen S.
Quality of qualitative research in the health sciences: analysis of the common criteria present in 58 assessment guidelines by expert users.
].
The objective of this study was to improve the content validity of the MMAT by identifying the most relevant methodological criteria for appraising the quality of qualitative, survey, and mixed methods studies. This study focused on these three categories of studies because of the scarcity of literature and lack of consensus.
2. Methods
Two phases were conducted: (a) a literature review to identify existing criteria and (b) a modified e-Delphi technique. The Delphi technique is used to reach consensus among a group of experts [
[20]- Keeney S.
- Hasson F.
- McKenna H.
The Delphi technique in nursing and health research.
] and is particularly suitable to build consensus on issues that have limited or contradictory evidence [
[21]- Hasson F.
- Keeney S.
- McKenna H.
Research guidelines for the Delphi survey technique.
]. It has been used for the development of other critical appraisal tools for different types of studies such as prognostic studies, case series studies, cross-sectional studies, studies on measurement properties, and randomized controlled trials (see
Supplementary File 1). The Delphi technique is characterized by two or more rounds of questionnaires with controlled feedback, statistical group response, and anonymity [
[20]- Keeney S.
- Hasson F.
- McKenna H.
The Delphi technique in nursing and health research.
]. There are different types of Delphi designs [
[20]- Keeney S.
- Hasson F.
- McKenna H.
The Delphi technique in nursing and health research.
]. We used a modified e-Delphi, meaning that the Delphi was administered via an online web survey and used preselected methodological criteria in the first round.
2.1 Phase 1: literature review
To identify methodological criteria, we performed a literature review of critical appraisal tools for qualitative, surveys, and mixed methods studies. In the MMAT, because surveys are part of the quantitative descriptive studies category, we also included tools that were related to cross-sectional and prevalence studies.
2.1.1 Sources
Two main literature sources were used. The first was a review of systematic mixed studies reviews that was carried out in 2015 [
[22]- Hong Q.N.
- Pluye P.
- Bujold M.
- Wassef M.
Convergent and sequential synthesis designs: implications for conducting and reporting systematic reviews of qualitative and quantitative evidence.
]. In this review, six databases (MEDLINE, PsycINFO, Embase, CINAHL, AMED, and Web of Science) were searched from inception of each database until December 8, 2014 and analyzed 459 reviews. The second was 15 reviews on critical appraisal tools identified from citation tracking of tools found in the first source and from reviews known to the authors of this paper (see
Supplementary File 1). Also, based on the findings of our review of systematic mixed studies reviews [
[22]- Hong Q.N.
- Pluye P.
- Bujold M.
- Wassef M.
Convergent and sequential synthesis designs: implications for conducting and reporting systematic reviews of qualitative and quantitative evidence.
], we also considered tools often used and which were developed by three leading international institutions: Critical Appraisal Skills Programme, Joanna Briggs Institute, and National Institute for Health and Clinical Excellence.
2.1.2 Selection criteria
Critical appraisal tools assessing methodological quality were retained, whereas tools limited to the quality of reporting of studies were excluded. Tools that included both reporting and methodological quality criteria were retained and only the methodological quality criteria were considered. We only retained appraisal tools that provided a clear description of their development with a group of experts or that had been subject to validity or reliability testing.
2.1.3 Identification of items
For each retained appraisal tool, all the criteria were extracted and entered in a spreadsheet by one person (QNH). Two team members (QNH, PP) independently screened the list to include methodological quality criteria. The following were excluded: criteria limited to the quality of reporting (e.g., the response rate is reported); generic criteria, that is, criteria that were related to the general steps for conducting any research study (e.g., the problem is accurately depicted or ethical issues are adequately considered); and criteria that were specific to a topic (e.g., the ethnic composition of the population studied is recorded or the topic is relevant to primary health care). Duplicates and criteria on the same concept were removed (e.g., reflexivity of the account and evidence of reflexiveness in the process). The preliminary list was sent to all members of the research team (authors of this paper) who had backgrounds in qualitative, epidemiology, and mixed methods studies. They were asked to review the list, identify the criteria that were unclear, and suggest modifications, if necessary. They were also asked to suggest criteria they felt were missing from the list.
2.2 Phase 2: two-round modified e-Delphi study
Three method-specific panels of experts were asked to complete two rounds of Delphi questionnaires to identify the most relevant methodological criteria for critical appraisal. Relevance was defined as the appropriateness of the elements to the targeted construct [
[15]- Haynes S.N.
- Richard D.
- Kubany E.S.
Content validity in psychological assessment: a functional approach to concepts and methods.
]. In this study, the targeted construct was the methodological quality of studies.
2.2.1 Sample
For each panel, a purposeful sample of international experts was constituted. An expert is defined as an individual with knowledge and skills in a specific area [
[23]- Baker J.
- Lovell K.
- Harris N.
How expert are the experts? An exploration of the concept of ‘expert’ within Delphi panel techniques.
]. For the purposes of this e-Delphi, the experts were researchers working in an academic or research institution with research interests in the methodological development of either qualitative, survey, or mixed methods studies. To identify the experts, the lead author performed a search of books and methodological papers in Google Scholar, the McGill Library catalog, and Amazon. Then, the biographies of publications’ authors were consulted on the World Wide Web to verify their research design expertise (e.g., by checking their research interest and expertise, courses taught, and scientific publications). The lead author compiled the list of experts, categorized by research design, and submitted it to the full research team, asking members to add any missing experts. A total of 196 experts (i.e., potential participants) were retained.
2.2.2 Data collection and analysis
The questionnaires were put online using the LimeSurvey software hosted on the McGill University server. Pilot testing of the online questionnaires was conducted with one professor, two graduate students, and one research associate to obtain feedback regarding the clarity of the instructions, ease of completing the questionnaires, technical difficulties encountered, and to estimate the time needed to complete the task.
In Round-one, the experts were asked to rate the relevance of each criterion. A 5-point Likert scale was used, ranging from 1 = not at all relevant to 5 = extremely relevant. Space was included at the end of the questionnaire for participants to provide comments and suggestions. A 1-month turnaround time was given for panel members to complete the questionnaire. Based on the comments provided in Round-one, some criteria were modified and new criteria were added. A summary table of the results including group ratings and comments obtained in this round was prepared. This table was used to provide controlled feedback and statistical group response to participants, two important characteristics of the Delphi technique [
[24]Consensus measurement in Delphi studies: review and implications for future quality assurance.
].
For Round-two, each participant was sent the summary table including a reminder of their responses and a new questionnaire to complete. The participants were asked to (re)rate all criteria using the same 5-point Likert scale. In addition, a “cannot answer” response category was added (at the request of participants). Space was provided at the end of each question for comments and suggestions. The data of Round-two were summarized by calculating an agreement index. For each item, the number of experts rating criteria as very relevant or extremely relevant was divided by the total number of experts. For each item, we considered that consensus had been reached if the agreement index was 0.80 or more.
We used the agreement indexes and the comments from Round-two as well as the literature review on critical appraisal tools to inform the revision of the MMAT. Specifically, we verified if the criteria in the current version of the MMAT (version 2011) were among those with an agreement score ≥ 0.80. If not, we considered how they could be modified or replaced with new ones on similar concepts. Experts’ comments were used to reformulate some criteria.
2.3 Ethics statement
This project was approved by the Institutional Review Board of the Faculty of Medicine Research and Graduate Studies Offices from McGill University (ethics certificate number # A05-E26-15B). An electronic consent form was included in the questionnaire of Round-one. All experts provided informed consent to participate in this study and to be acknowledged in this paper. The responses were kept anonymous to the panel, and no personally identifiable information was presented in the data file used for the analysis.
4. Discussion
A framework for developing assessment tools has been proposed in which three main stages are defined: initial steps, tool development, and dissemination [
[32]- Whiting P.
- Wolff R.
- Mallett S.
- Simera I.
- Savović J.
A proposed framework for developing quality assessment tools.
]. This study is situated in the tool development stage by generating and seeking for consensus on criteria for three of the five study categories included in the MMAT (qualitative, survey, and mixed methods studies). We used a modified e-Delphi technique to identify the most relevant criteria for appraising the quality of these three categories. Consensus was reached for six criteria related to qualitative studies, eight for surveys, and seven for mixed methods studies. Results of this study improved the content validity of the MMAT, informed its revision, and led to propose a new version (MMAT version 2018).
Three main changes have been made to the MMAT. In the previous version, the MMAT had four criteria for each category of studies. Based on our results, the revised version is composed of five criteria for each category of studies, and changes were made in some criteria of the MMAT (see
Supplementary File 2). Another change concerns the overall numerical score. In the previous version, an overall score could be calculated by counting the number of criteria rated “yes”. Currently, in the literature on critical appraisal tools, it is discouraged to calculate an overall score because it does not provide information on what aspects of studies are problematic and provide equal weight to all criteria [
33- Viswanathan M.
- Ansari M.T.
- Berkman N.D.
- Chang S.
- Hartling L.
- McPheeters M.
- et al.
Assessing the risk of bias of individual studies in systematic reviews of health care interventions.
,
34- Herbison P.
- Hay-Smith J.
- Gillespie W.J.
Adjustment of meta-analyses on the basis of quality scores should be abandoned.
,
35Cochrane handbook for systematic reviews of interventions.
,
36A review of critical appraisal tools show they lack rigor: Alternative tool structure is proposed.
,
37- Colle F.
- Rannou F.
- Revel M.
- Fermanian J.
- Poiraudeau S.
Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain.
]. On this basis, it was decided to remove the overall numerical score from the MMAT. Instead, it is advised to provide a detailed presentation of the ratings of the criteria to better inform the quality of the included studies and encourage performing sensitivity analysis. Third, changes were made in the user manual and an algorithm was added to help MMAT users choose the set(s) of criteria to use. The algorithm was developed based on existing algorithms of quantitative study designs (see
Supplementary File 1). The version 2018 of the MMAT is available at this website:
http://mixedmethodsappraisaltoolpublic.pbworks.com/ (see
Appendix 1).
The results of the critical appraisal of individual studies can be used to assess the overall quality of evidence and strength of the recommendations, that is, to judge how much confidence to place in the body of evidence. Several approaches for rating the overall quality of evidence have been developed, such as Grading of Recommendations, Assessment, Development and Evaluations (GRADE) [
[38]- Guyatt G.
- Oxman A.D.
- Akl E.A.
- Kunz R.
- Vist G.
- Brozek J.
- et al.
GRADE guidelines: 1. Introduction - GRADE evidence profiles and summary of findings tables.
] and GRADE-Confidence in the Evidence from Reviews of Qualitative research (CERQual) [
[39]- Lewin S.
- Glenton C.
- Munthe-Kaas H.
- Carlsen B.
- Colvin C.J.
- Gülmezoglu M.
- et al.
Using qualitative evidence in decision making for health and social interventions: an approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual).
]. In these approaches, the methodological quality of individual studies (or risk of bias) is one factor that is considered among others such as the relevance of the evidence to answer the review question (indirectness), variation across studies (inconsistency), and random error on evidence (imprecision).
There is a need to further content validate the criteria identified in this study, particularly for surveys. In this study, no criteria related to measurement and response rate biases in surveys made consensus (
Table 3). This might be due to the fact that diverse sources can influence measurement errors (e.g., questionnaire, data collection method, interviewer, and respondent) [
[27]- Federal Committee on Statistical Methodology
Measuring and reporting sources of error in surveys.
] and can vary from one study to the other. As for response rate, different indicators can be used to judge nonresponse bias such as identifying the reasons for nonresponse, determining if the respondents and nonrespondents differ on the survey variable of interest and weighting for nonresponse [
[27]- Federal Committee on Statistical Methodology
Measuring and reporting sources of error in surveys.
]. Although no specific criteria on measurement and response rate reached high level of consensus, the research team decided not to exclude these two biases from the MMAT because they are often mentioned in the literature [
27- Federal Committee on Statistical Methodology
Measuring and reporting sources of error in surveys.
,
40Nonresponse rates are a problematic indicator of nonresponse bias in survey research.
,
41- Dillman D.A.
- Phelps G.
- Tortora R.
- Swift K.
- Kohrell J.
- Berck J.
- et al.
Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the Internet.
]. Further content validation work is needed to refine these criteria. Also, in the MMAT version 2011, surveys are included in the broad “quantitative descriptive studies” category. We focused on surveys because they are often included in systematic mixed studies reviews, the existing tools have not been developed with experts, and surveys are among the most commonly used methods in mixed methods studies [
[42]Integrating quantitative and qualitative research: how is it done?.
]. Subsequent research should verify if the new criteria are applicable to other quantitative descriptive study designs.
Developing clear critical appraisal criteria is challenging. Experts provided several comments regarding the terms used in the criteria. For example, terms like “relevant,” “adequate,” and “appropriate” were considered ambiguous. These terms are often used in critical appraisal tools of qualitative research [
[19]- Santiago-Delefosse M.
- Gavin A.
- Bruchez C.
- Roux P.
- Stephen S.
Quality of qualitative research in the health sciences: analysis of the common criteria present in 58 assessment guidelines by expert users.
]. Compared to reporting quality criteria, methodological quality criteria are more difficult to interpret because the reviewers need to judge whether the results that are reported can be trustworthy [
[43]- Carroll C.
- Booth A.
- Lloyd-Jones M.
Should we exclude inadequately reported studies from qualitative systematic reviews? An evaluation of sensitivity analyses in two case study reviews.
]. Also, criteria may be interpreted differently depending on the topic and context of the study.
The MMAT differs from other critical appraisal tools in several ways. To assess the quality of mixed methods studies, O'Cathain [
[11]Assessing the quality of mixed methods research: towards a comprehensive framework.
] suggested three different approaches: (a) generic research approach, (b) individual component approach, and (c) mixed methods approach. According to our review, the MMAT is the only tool that includes specific criteria for mixed methods studies [
[44]Critical appraisal tools for assessing the methodological quality of qualitative, quantitative and mixed methods studies included in systematic mixed studies reviews.
]. With its five different sets of criteria, the MMAT uses a combination of individual component and mixed methods approaches. Other tools used in systematic mixed studies reviews approach critical appraisal differently. For example, Crowe and Sheppard [
[36]A review of critical appraisal tools show they lack rigor: Alternative tool structure is proposed.
] use a generic approach by proposing one set of criteria that could be applied to any design. Others, such as those from the Critical Appraisal Skills Programme, Joanna Briggs Institute, and National Institute for Health and Clinical Excellence, propose one tool for each different study design (individual component approach). Also, some tools such as the Quality Assessment Tool for Studies with Diverse Designs (QATSDD) [
[45]- Sirriyeh R.
- Lawton R.
- Gardner P.
- Armitage G.
Reviewing studies with diverse designs: the development and evaluation of a new tool.
] use a combination of generic and individual component approaches, with generic criteria applicable to several designs and specific criteria for qualitative and quantitative studies.
In addition, the MMAT is distinct from the other tools in that it focuses on methodological quality criteria and consists of a small number of items. Similar to other risk of bias tools [
[46]- Higgins J.P.T.
- Altman D.G.
- Gøtzsche P.C.
- Jüni P.
- Moher D.
- Oxman A.D.
- et al.
The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials.
], the MMAT focuses on the core criteria that may hinder the validity of the findings of a study. Some criteria (such as information on ethical considerations), though essential in a research process, may have less impact on the validity of a study compared to other methodological criteria (such as appropriate measurement).
4.1 Strengths and limitations
Given that we found 15 reviews analyzing more than 500 critical appraisal tools, we considered that an overview of these reviews was an efficient approach to meet our objectives. Yet, it is likely that not all critical appraisal tools were included in the literature review because the search strategy did not include tools published in books and developed after 2015. For example, two recent literature reviews on tools for qualitative studies analyzed more than 100 tools [
19- Santiago-Delefosse M.
- Gavin A.
- Bruchez C.
- Roux P.
- Stephen S.
Quality of qualitative research in the health sciences: analysis of the common criteria present in 58 assessment guidelines by expert users.
,
47Appraising qualitative research for evidence syntheses: a compendium of quality appraisal tools.
]. Also, we limited our review to tools that had been validated or tested for reliability. Although it is possible that we did not identify all eligible critical appraisal tools, the pool of items we identified included over 75% criteria that were generic, reporting quality and duplicate. This suggests that our sample included the main criteria.
The number of experts on the three panels in Round-two ranged from 15 to 21. There is no rule regarding the required sample size for a Delphi. Some authors suggest a panel of 8 to 12 participants, whereas others recommended 300 to 500 [
[20]- Keeney S.
- Hasson F.
- McKenna H.
The Delphi technique in nursing and health research.
]. One important factor to take into consideration when determining the size is the composition of the sample (homogeneous or heterogeneous). Usually, a smaller sample, such as 10 to 15 participants, is considered sufficient for homogeneous samples [
[20]- Keeney S.
- Hasson F.
- McKenna H.
The Delphi technique in nursing and health research.
]. Similarly, there is no clear recommendation regarding the number of experts needed for content validation. Lynn [
[48]Determination and quantification of content validity.
] suggested that five experts could be sufficient. Polit, Beck, and Owen [
[49]- Polit D.F.
- Beck C.T.
- Owen S.V.
Is the CVI an acceptable indicator of content validity? Appraisal and recommendations.
] recommended having 8 to 12 experts for the first round. Given this, because our samples were relatively homogenous in terms of experts’ methodological expertise, their sizes may be considered acceptable.
Not all those who conduct systematic reviews are researchers with methodological expertise. Our study could have benefited from including such individuals in our panels of experts. For instance, the experience of health technology assessment practitioners or clinicians with experience in systematic reviews could have contributed to identifying relevant criteria to appraise. Future research and pilot testing of the MMAT could include this population.
The decision to use an agreement index threshold of 0.80 used in this study was arbitrary. There is no standard threshold for determining consensus in a Delphi study. Studies have used values varying from 0.50 to 0.80 [
[20]- Keeney S.
- Hasson F.
- McKenna H.
The Delphi technique in nursing and health research.
]. In a previous study, it was found that criteria with an index of 0.78 or higher were indicative of good content validity [
[48]Determination and quantification of content validity.
]. Because the aim of this study was to identify core sets of criteria for validity content purpose, it was decided to use a high threshold.
Likert scales may have some limitations related to central tendency and desirability biases [
[50]Likert scales: how to (ab)use them.
]. To limit this bias, we calculated frequencies (instead of means) and considered two ratings (very relevant and extremely relevant) to compute the agreement index.
Acknowledgments
The research team would like to acknowledge and sincerely thank all the e-Delphi panel experts for their contributions. Here are the names of the participants who wished to be acknowledged: Lesley Andres (University of British Columbia, Canada); Theodore Bartholomew (Purdue University, United States); Pat Bazeley (Research Support/University of New South Wales, Australia); Jelke Bethlehem (Leiden University, Netherlands); Paul Biemer (RTI International, United States); Jaak Billiet (University of Leuven, Belgium); Felicity Bishop (University of Southampton, England); Jörg Blasius (University of Bonn, Germany); Hennie Boeije (University of Utrecht, Netherlands); Jonathan Burton (Understanding Society, England); Kathy Charmaz (Sonoma State University, United States); Benjamin Crabtree (The State University of New Jersey, United States); Elizabeth Creamer (Virginia Tech University, United States); Edith de Leeuw (University of Utrecht, Netherlands); Claire Durand (Université de Montréal, Canada); Joan Eakin (University of Toronto, Canada); Michèle Ernst Stähli (Université de Lausanne, Switzerland); Michael Fetters (University of Michigan Medical School, United States); Nigel Fielding (University of Surrey, England); Rory Fitzgerald (University of London, England); Floyd Fowler (University of Massachusetts, United States); Dawn Freshwater (University of Western Australia, Australia); Jennifer Greene (University of Illinois at Urbana-Champaign, United States); Christina Gringeri (University of Utah, United States); Greg Guest (FHI 360, United States); Timothy Guetterman (University of Michigan Medical School, United States); Muhammad Hadi (University of Leeds, England); Elizabeth Halcomb (University of Wollongong, United States); Carolyn Heinrich (Vanderbilt University, United States); Sharlene Hesse-Biber (Boston College, United States); Mieke Heyvaert (University of Leuven, Belgium); John Hitchcock (Indiana University Bloomington, United States); Nataliya Ivankova (University of Alabama at Birmingham, United States); Laura Johnson (Northern Illinois University, United States); Paul Lavrakas (University of Chicago, United States); Marilyn Lichtman (Virginia Tech University, United States); Geert Loosveldt (University of Leuven, Belgium); Peter Lynn (University of Essex, England); Mary Ellen Macdonald (McGill University, Canada); Claire Howell Major (University of Alabama, United States); Maria Mayan (University of Alberta, Canada); Sharan Merriam (University of Georgia, United States); José Molina-Azorín (University of Alicante, Spain); David Morgan (Portland State University, United States); Peter Nardi (Pitzer College, United States); Katrin Niglas (Tallinn University, Estonia); Karin Olson (University of Alberta, Canada); Antigoni Papadimitriou (Johns Hopkins University, United States); Michael Quinn Patton (Independent organizational development and program evaluation consultant, United States); Rogério Meireles Pinto (Columbia University School of Social Work, United States); Vicki Plano Clark (University of Cincinnati, United States); David Plowright (University of Hull, England); Blake Poland (University of Toronto, Canada); Rodney Reynolds (California Lutheran University, United States); Gretchen B. Rossman (University of Massachusetts Amherst, United States); Erin Ruel (Georgia State University, United States); Michael Saini (University of Toronto, Canada); Johnny Saldaña (Arizona State University, United States); Joanna Sale (Li Ka Shing Knowledge Institute, Canada); Karen Schifferdecker (Dartmouth College, United States); David Silverman (University of London, England); Ineke Stoop (Netherlands Institute for Social Research, Netherlands); Sally Thorne (University of British Columbia, Canada); Sarah Tracy (Arizona State University, United States); Frederick Wertz (Fordham University, United States). The authors gratefully acknowledge the sponsorship from the Method Development platform of the Québec SPOR SUPPORT Unit (#BRDV-CIHR-201-2014-05), the CIHR Doctoral Fellowship Award (#301011), and the FRSQ Senior Investigator Award (#29308).
Article info
Publication history
Published online: March 21, 2019
Accepted:
March 6,
2019
Footnotes
Conflict of interest statement: Quan Nha Hong, OT, MSc, PhD. This manuscript was written while she was a PhD candidate and held a Doctoral Fellowship Award from the Canadian Institutes of Health Research (CIHR). Pierre Pluye, MD, PhD, Full Professor, holds a Senior Investigator Award from the Fonds de recherche du Québec–Santé (FRQS) and is the Director of the Methodological Development Platform of the Quebec-SPOR SUPPORT Unit, which is funded by the CIHR, the FRQS, and the Quebec Ministry of Health.
Copyright
© 2019 The Authors. Published by Elsevier Inc.