Lack of transparency in reporting narrative synthesis of quantitative data: a methodological assessment of systematic reviews

Objective To assess the adequacy of reporting and conduct of narrative synthesis of quantitative data (NS) in reviews evaluating the effectiveness of public health interventions. Study Design and Setting A retrospective comparison of a 20% (n = 474/2,372) random sample of public health systematic reviews from the McMaster Health Evidence database (January 2010–October 2015) to establish the proportion of reviews using NS. From those reviews using NS, 30% (n = 75/251) were randomly selected and data were extracted for detailed assessment of: reporting NS methods, management and investigation of heterogeneity, transparency of data presentation, and assessment of robustness of the synthesis. Results Most reviews used NS (56%, n = 251/446); meta-analysis was the primary method of synthesis for 44%. In the detailed assessment of NS, 95% (n = 71/75) did not describe NS methods; 43% (n = 32) did not provide transparent links between the synthesis data and the synthesis reported in the text; of 14 reviews that identified heterogeneity in direction of effect, only one investigated the heterogeneity; and 36% (n = 27) did not reflect on limitations of the synthesis. Conclusion NS methods are rarely reported in systematic reviews of public health interventions and many NS reviews lack transparency in how the data are presented and the conclusions are reached. This threatens the validity of much of the evidence synthesis used to support public health. Improved guidance on reporting and conduct of NS will contribute to improved utility of NS systematic reviews.


What this adds to what was known? 48
• This is the first study to assess the adequacy of reporting of narrative synthesis of 49 quantitative data in systematic reviews. 50

What is the implication and what should change now? 51
• Substantial improvements in clarity of reporting of narrative synthesis are required. There is 52 a need for existing guidance to inform the development of a clear and concise reporting 53 guideline for narrative synthesis. 54 • Greater transparency when reporting narrative synthesis will allow end users including 55 practitioners and policy decision-makers to have greater confidence in the results of 56 systematic reviews that use narrative synthesis. 57 58 M A N U S C R I P T

INTRODUCTION 59
Well conducted systematic reviews have an important role in supporting evidence-informed policy 60 and practice. [1,2] The value of systematic reviews in supporting decision-making, compared with 61 other types of review, is their use of a transparent method to draw conclusions based on the best 62 available evidence. While meta-analysis is a cornerstone of many systematic reviews, statistical 63 pooling may not always be appropriate or feasible due to high levels of heterogeneity or lack of 64 available data to calculate standardised effect estimates (e.g. standardised mean difference, odds 65 ratio, risk ratio). Heterogeneity, both statistical and methodological, is a common issue for public 66 health reviews where it is typical to include diverse study designs, outcomes, contexts, populations, 67 and interventions.
[3] When meta-analysis is inappropriate or not possible, data may be synthesised 68 narratively; this method is relied on heavily by those conducting reviews addressing public health 69 issues. For example, 74% of National Institute for Health and Care Excellence (NICE) public health 70 appraisals included NS.
[4] 71 Concerns have been raised that Narrative Synthesis of quantitative data (NS) lacks transparency and 72 has substantial potential for bias.
[5-7] Specifically, there is concern that conclusions of NS are based 73 on subjective interpretation[5, 7] with a risk of over emphasising selected results without clear 74 justification. This lack of transparency, limits assessment of the level and sources of bias in NS, [5] 75 threatens the replicability of the method, and may ultimately threaten the validity and value of 76 review findings based on NS. However, empirical evaluations of the reporting and adequacy of NS 77 are lacking. This paper presents the findings of a systematic review that aimed to establish current 78 practice, and adequacy of reporting and conduct of NS of quantitative data in public health 79 systematic reviews. 80

METHODS 81
To assess reporting and conduct of NS, we identified a random sample of recent public health 82 systematic reviews and systematically assessed the adequacy of reporting and conduct by 83 benchmarking against available published guidance. inclusive. The Health Evidence database contains systematic reviews relevant to public health which 102 meet each of the following criteria: address questions related to promotion, protection or 103 prevention in public health or health; include participants from developed countries; examine an 104 intervention/programme/service/policy; include evidence on outcomes; and describe a search 105 M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT 6 strategy (see http://www.healthevidence.org/our-appraisal-tools.aspx). The Health Evidence 106 database uses a validated search filter which has high sensitivity, specificity and precision for 107 retrieving systematic reviews of public health interventions.
[10] In addition to the database inclusion 108 criteria, we specified that reviews had to be systematic and contain synthesis; we excluded expert 109 reviews, overviews, empty reviews and reviews with no synthesis. 110 Using the Microsoft Excel © random number function, a 20% random sample was selected from the 111 full Health-Evidence database download. The Excel random number function was used to allocate a 112 number to each database entry (the results of the Health Evidence database search) and numbers 113 were sorted lowest to highest. The first 20% of the random numbers were used to identify and 114 include the corresponding Health Evidence reviews. This sample of reviews was screened (by MC, 115 HT, AS, SVK) to identify reviews using NS of quantitative data for their primary outcome. If the 116 review did not state a primary outcome, we identified the "primary outcome" of interest by the 117 review question(s). A further 30% sub-sample of reviews which used NS as the primary method of 118 synthesis was randomly selected for more detailed data extraction and analysis. 119

Data extraction 120
The data extraction form was designed to reflect key elements of good practice in the conduct and 121 reporting of NS of quantitative data. Key sources on the conduct of NS of quantitative data[11-16] 122 informed the design of the data extraction form. (See Box 1) Three members of the research team 123 (MC, HT & SVK) read the key sources independently and prepared a list of items or components that 124 were common in the key sources. The lists were then collated to prepare items for inclusion in the 125 draft data extraction form, which was then finalised in discussion with all authors (online Supporting  126 Information file, Appendix Table S1). There was little variation in recommended practice for NS 127 across the identified sources. The ESRC guidance provided the most comprehensive explanation and 128 the other sources appeared to draw heavily on this guidance.
[9] The data extraction form, therefore, 129 largely reflects the core components recommended in the ESRC guidance. Five main aspects of NS 130 were identified and covered by the data extraction exercise, namely: 131 • Reporting of NS methods 132 • Use of theory (i.e. articulation of how the intervention is expected to work) 133 • Management and investigation of heterogeneity across studies 134 • Transparency of data presentation and links to narrative 135 • Assessment of robustness of the synthesis (i.e. reflection of the synthesis methods used to 136 assess the strength of the evidence from the included studies) 137 Two reviewers (MC and HT) independently piloted the data extraction form. All members of the 138 project team conducted data extraction on a selection of the same five reviews until assessments 139 were consistent across each member of the research team (MC, HT, SVK, AS). The data were entered 140 directly into a Microsoft Excel© database. Health Evidence quality assessment ratings of the reviews 141 were gathered after the data extraction exercise was complete. 142

Summarising the data 143
The extracted data were tabulated to reflect the five main aspects of NS (see above) and are 144 described narratively, with frequencies and descriptive data. Text was extracted to illustrate the 145 reporting of NS methods. and October 2015 were available from The McMaster Health Evidence database (see Figure 1). From 149 the initial 20% (n=474/2372) random sample of reviews, 28 (6%) were excluded as they did not fit 150 our inclusion criteria: not systematic review (expert review/overview) (n=8) or were empty reviews 151 (contained no studies) (n=2). We were unable to retrieve the full text of 18 further reviews. Of the 152 446 reviews included, 251 (56%) synthesised the data for the primary outcome narratively; of these, 153 215 (48%) used NS exclusively, and 36 (8%) used a combination of NS and meta-analysis for primary 154 outcome data (i.e. some data were included in the meta-analysis, with other data reported and 155 discussed in the narrative text). The remaining reviews (44%, n=195) used meta-analysis to 156 synthesise the primary outcome data. 157

Included reviews 158
All of the included reviews were published in international peer review journals. For a list of the 159 included reviews, see Appendix Table S2. A list of results of extracted items reported in the text of 160 this paper is provided in Appendix Table S3. The McMaster Health Evidence database provides a 161 quality assessment of each included review, this is based on a ten-item quality assessment tool that 162 covers all aspects of the systematic review process. The assessment incorporates clarity of review 163 question, appropriate search strategy, and risk of bias assessment, and two items assessing aspects 164 of synthesis ('Was it appropriate to combine the findings of results across studies?', 'Were 165 appropriate methods used for combining or comparing results across studies?') 166 (https://www.healthevidence.org/our-appraisal-tools.aspx.). We randomly selected and analysed 167 the 75 reviews in our sample blind to the Health Evidence quality assessment scores and retrieved 168 these scores after our data extraction exercise was complete. Of the reviews in our sample, 37% had 169 a strong rating (score of 8 to 10/10), 60% moderate (score of 5 to 7/10), and 3% weak (score of 1 to 170 4/10). Therefore, we are confident that the majority of the sample reviews followed good practice; 171 however that assessment process did not fully examine the synthesis processes in the systematic 172 reviews. 173 The following sections report on the detailed data extraction conducted on the 30% (n=75/251) 174 random sample of the reviews that synthesised data narratively. 175

Reporting of narrative synthesis methods 176
While 75 reviews synthesised data narratively, i.e. using text only, a description of the methods used 177 for NS was absent in 95% of the reviews (n=71). Where methods were reported, the description was 178 typically sparse, see examples in Box 2. Few review authors used the term 'narrative synthesis' to 179 describe their synthesis; 27% (n=20/75) described their synthesis as 'narrative' or 'qualitative', and 180 justification for using NS was rarely provided (15%, n=3/20). In around half (51%, n=38/75) of the 181 reviews using NS, the authors stated that they were unable to conduct a meta-analysis but provided 182 no further details of how the data were synthesised ( The lack of protocols for most reviews prevented recording whether investigation of heterogeneity 206 was pre-specified. This study was not assessing the appropriateness of the investigation of 207 heterogeneity. This would require expertise in the topic of investigation for all the reviews, which 208 M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT our project team did not have. Rather, we describe how investigation of heterogeneity was 209 conducted. Only one review investigated heterogeneity in the direction of effect; specifically, the 210 authors explored differences in intervention components (treatment regimens) across studies, and 211 provided an explanation for the heterogeneity. Ten reviews provided hypothetical explanations for 212 the variance in reported effect directions and three reviews did not offer any explanation. 213 Hypothesised explanations for heterogeneity focussed on differences in the characteristics or 214 outcome measures of interventions, or the risk of bias of included studies. In one review (2%) the 215 authors linked their hypothesised explanation of heterogeneity in reported effects to a pre-specified 216 theory, suggesting that intervention adherence influenced the outcome. 217

Transparency of data presentation and links to narrative 218
Tables presenting outcome data were provided in 85% (n=64) of reviews, either alongside the text or 219 as an online appendix. While 54% (n=40) of the reviews made the full data extraction available, 220 either in the article (43%, n=32) or online (11%, n=8), the remaining 47% (n=35) of reviews did not 221 provide access to all the data incorporated into the synthesis. In 15% (n=11) of reviews, not all the 222 included studies were referred to in the narrative, leading to uncertainty as to whether the data 223 from these studies had been included. 224 Using information about the type, detail, and clarity (including grouping) of reporting of data in each 225 review, we assessed transparency; 57% (n=43) of reviews were assessed as promoting transparent 226 links between the data and the text. A summary table presenting key characteristics of included  227 studies was included in 97% (n=73) of reviews; providing information about study design, 228 intervention, population, and outcomes (Table 1, item 3.1, 3.2). 229 We also assessed the extent to which review conclusions were linked to the included data, based on 230 how clearly the conclusions referred to the reported results. We judged this to be clear, (i.e. the key 231 findings in the conclusion clearly referred back to the text or visual evidence in the results), to a large 232 extent or to some extent for most reviews (n=45 and n=25 respectively); however, in 7% (n=5) of 233 reviews there was no clear link between the conclusions and the evidence referred to in the 234 synthesis. 235

Assessment of the robustness of the synthesis 236
When considering the strengths and limitations of the evidence, review authors were more likely to 237 reflect on the limitations of the primary studies included in the review (88%, n=66), rather than 238 limitations of the synthesis they had conducted (64%, n=48). Limitations referred to risk of bias in 239 included studies, relevance and reporting of study and intervention details, and heterogeneity of 240 outcome measurements ( Narrative synthesis is more commonly used than meta-analysis for synthesising quantitative data in 251 systematic reviews of public health interventions. Despite its popularity, our detailed assessment 252 shows that reporting of NS methods is almost totally absent, and the transparency of how NS is 253 conducted is variable and currently inadequate. In 95% of reviews relying on NS for their primary 254 outcome, all from international peer review journals, the methods used were not described. While 255 the majority of reviews did incorporate some core components of good practice (describing the 256 rationale for the intervention, transparently relating tabulated data to the text in the results, and 257 reflecting on the robustness of the synthesis), fewer than 30% of the reviews adopted each of these 258 components. Our findings support previous criticism of NS as being opaque, particularly in relation 259 to interpreting the evidence and being susceptible to selective reporting. This potential for bias is 260 important and threatens the value of systematic reviews that use NS. In public health, where NS is 261 commonly used, these are important issues undermining the role of these key resources as tools to 262 support evidence informed decision making in public health. 263 The findings of our work are based on a representative sample of reviews from the Health Evidence 264 database; a comprehensive source of systematic reviews of public health interventions.
[10] 265 Limitations of our study include the lack of a gold standard with which to compare reporting of NS. 266 We used single assessors for data extraction, however this was only after good agreement in the 267 data extraction was achieved between independent assessors. Our sample of reviews allows an 268 overall assessment of current practice within public health reviews, but we are aware that the 269 sample is too small to allow robust comparison of reporting and conduct in reviews from different 270 disciplines or different health topics. Despite the focus on public health, the findings are likely to be 271 relevant to the wider field of evidence synthesis, regardless of topic. Indeed, we suspect that the 272 conduct of NS may be poorer in other topic areas where there is less familiarity with NS as a method. 273 NS will continue to be a necessary method of synthesis due to the complex nature of many 274 interventions and the need to support evidence informed decision making.
[19] 275 The limited reference to available guidance on NS and the near absence of reporting of NS methods, 276 suggests that there is a general lack of familiarity with NS as a method among review authors. 277 Furthermore, the lack of justification for using NS beyond statements such as 'it was not possible to 278 conduct meta-analysis' suggests that review authors may not consider NS to be a discrete method of 279 synthesis. This is supported by our own informal discussions with experienced review authors who 280 have expressed uneasiness around how to conduct and assess NS, yet acknowledge that NS is an 281 important and essential method for reviews with high levels of heterogeneity and where diverse 282 types of evidence are included. 283 Despite its frequent use, development of NS methods has been scant. This is in contrast to work to 284 promote rigor in statistical synthesis or meta-analysis, (5)  these types of reviews, not only as an alternative when meta-analysis is contra-indicated but as an 295 important synthesis tool in its own right. It offers a method for exploring and understanding the 296 underlying arguments and justification of claims made in the included studies of a review.(28) NS 297 enables reviewers to incorporate diversity in study designs, participants, interventions or outcomes. 298 NS is likely to remain an important method for bringing together heterogeneous evidence. The work 299 reported here shows that current practice in the conduct and in particular, the reporting of NS, is not 300 consistent with the standards of transparency expected from rigorous and reliable systematic 301 reviews. There is a need to provide support to those conducting NS and those attempting to assess 302 the reliability of NS of quantitative data. NS is used in Cochrane reviews, perhaps more often than 303 presumed. We estimated at least 20% of recent Cochrane reviews used NS to synthesise outcome 304 data.
[30] We intend to contribute to the improved use of NS with The Improving the Conduct and 305 reporting Of Narrative Synthesis of Quantitative data (ICONS-Quant) project, supported by the 306 Cochrane Strategic Methods Fund which aims to produce guidance and reporting guidelines for 307 authors conducting NS of quantitative data (http://www.equator-network.org/library/reporting-308 guidelines-under-development/#74). Improved guidance has been linked to improved reporting of 309 research, [31] without which it is difficult for decision-makers to make use of research findings in the 310 real world.

Conclusion 313
Narrative Synthesis is a valuable method for synthesising quantitative data where meta-analysis is 314 not appropriate. While NS of quantitative data is widely used, it is poorly reported and transparency 315 is often lacking, threatening the credibility and value of many systematic reviews. The poor 316 reporting suggests a lack of familiarity with, and confidence about, how to implement best practice 317 when conducting NS. Improved guidance on the conduct and reporting of NS of quantitative data is 318 required to support authors and ensure reviews using NS can be reliably used by decision makers. 319