If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Corresponding author. CLARITY Research Group, Department of Clinical Epidemiology and Biostatistics, Room 2C12, 1200 Main Street, West Hamilton, Ontario, Canada L8N 3Z5. Tel.: +905-527-4322; fax: +905-523-8781.
Iberoamerican Cochrane Center-Servicio de Epidemiología Clínica y Salud Pública and CIBER de Epidemiología y Salud Pública (CIBERESP), Hospital de Sant Pau, Universidad Autónoma de Barcelona, Barcelona 08041, Spain
Center for Evidence-based Medicine and Health Outcomes Research, University of South Florida, Tampa, FL, USADepartment of Hematology, H. Lee Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Boulevard, MDC02, Tampa, FL 33612, USADepartment of Health Outcomes and Behavior, H. Lee Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Boulevard, MDC02, Tampa, FL 33612, USA
German Cochrane Center, Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, 79104 Freiburg, GermanyDivision of Pediatric Hematology and Oncology, Department of Pediatric and Adolescent Medicine, University Medical Center Freiburg, 79106 Freiburg, Germany
In the GRADE approach, randomized trials start as high-quality evidence and observational studies as low-quality evidence, but both can be rated down if a body of evidence is associated with a high risk of publication bias. Even when individual studies included in best-evidence summaries have a low risk of bias, publication bias can result in substantial overestimates of effect. Authors should suspect publication bias when available evidence comes from a number of small studies, most of which have been commercially funded. A number of approaches based on examination of the pattern of data are available to help assess publication bias. The most popular of these is the funnel plot; all, however, have substantial limitations. Publication bias is likely frequent, and caution in the face of early results, particularly with small sample size and number of events, is warranted.
Empirical evidence shows that, in general, studies with statistically significant results are more likely to be published than studies without statistically significant results (“negative studies”).
Systematic reviews performed early, when only few initial studies are available, will overestimate effects when “negative” studies face delayed publication. Early positive studies, particularly if small in size, are suspect.
Recent revelations suggest that withholding of “negative” results by industry sponsors is common. Authors of systematic reviews should suspect publication bias when studies are uniformly small, particularly when sponsored by the industry.
Empirical examination of patterns of results (e.g., funnel plots) may suggest publication bias but should be interpreted with caution.
In four previous articles in our series describing the GRADE system of rating the quality of evidence and grading the strength of recommendations, we have described the process of framing the question, introduced GRADE’s approach to rating the quality of evidence, and dealt with the possibility of rating down quality for study limitations (risk of bias). This fifth article deals with the another of the five categories of reasons for rating down the quality of evidence: publication bias. Our exposition relies to some extent on prior work addressing issues related to publication bias [
]; we did not conduct a systematic review of the literature relating to publication bias.
Even if individual studies are perfectly designed and executed, syntheses of studies may provide biased estimates because systematic review authors or guideline developers fail to identify studies. In theory, the unidentified studies may yield systematically larger or smaller estimates of beneficial effects than those identified. In practice, there is more often a problem with “negative” studies, the omission of which leads to an upward bias in estimate of effect. Failure to identify studies is typically a result of studies remaining unpublished or obscurely published (e.g., as abstracts or theses)—thus, methodologists have labeled the phenomenon “publication bias.”
An informative systematic review assessed the extent to which publication of a cohort of clinical trials is influenced by the statistical significance, perceived importance, or direction of their results [
]. It found five studies that investigated these associations in a cohort of registered clinical trials. Trials with positive findings were more likely to be published than trials with negative or null findings (odds ratio: 3.90; 95% confidence interval [CI]: 2.68, 5.68). This corresponds to a risk ratio of 1.78 (95% CI: 1.58, 1.95), assuming that 41% of negative trials are published (the median among the included studies, range=11–85%). In absolute terms, this means that if 41% of negative trials are published, we would expect that 73% of positive trials would be published. Two studies assessed time to publication and showed that trials with positive findings tended to be published after 4–5 years compared with those with negative findings, which were published after 6–8 years. Three studies found no statistically significant association between sample size and publication. One study found no statistically significant association between either funding mechanism, investigator rank, or sex and publication.
2. Publication bias vs. selective reporting bias
In some classification systems, reporting bias has two subcategories: selective outcome reporting, with which we have dealt in the previous article in the series, and publication bias. However, all the sources of bias that we have considered under study limitations, including selective outcome reporting, can be addressed in single studies. In contrast, when an entire study remains unreported and reporting is related to the size of the effect—publication bias—one can assess the likelihood of publication bias only by looking at a group of studies [
]. Currently, we follow the Cochrane approach and consider selective reporting bias as an issue in risk of bias (study limitations). This issue is currently under review by the Cochrane Collaboration, and both Cochrane and GRADE may revise this in future.
3. Variations in publication bias
The results of a systematic review will be biased if the sample of studies included is unrepresentative—whether the studies not included are published or not. Thus, biased conclusions can result from an early review that omits studies with delayed publication—a phenomenon sometimes termed “lag bias” [
]. Either because authors do not submit studies with what they perceive as uninteresting results to prominent journals or because of repeated rejection at such journals, a study may end up published in an obscure journal not indexed in major databases and not identified in a less-than-comprehensive search. Authors from non-English speaking countries may submit their negative studies to local journals not published in English [
]; these will inevitably be missed by any review that restricts itself to English-language journals. Negative studies may be published in some form (theses, book chapters, compendia of meeting abstract submissions—sometimes referred to as “gray literature”) that tend to be omitted from systematic reviews without comprehensive searching [
With each of these variations of publication bias, there is a risk of overestimating the size of an effect. However, the importance of unpublished studies, non-English language publication and gray literature are difficult to predict for individual systematic reviews.
One may have a mirror image phenomenon to the usual publication bias: a study may be published more than once, with different authors and changes in presentation that make the duplication difficult to identify, and potentially lead to double counting of results within systematic reviews [
]. Randomized trials reported only in abstract form in major cardiology journals showed smaller effects than trials fully published. Of those trials published, the earlier published studies showed larger effects than the later published studies. Studies with positive results were published in journals with higher impact factors than studies with negative conclusions. Systematic reviews proved vulnerable to these factors, included published studies more often than abstracts, and conveyed inflated estimates of treatment effect. Table 1 presents a number of ways that selective or nonpublication can bias the results of a best-evidence summary classified according to the phase of the publication process.
Table 1Publication bias
Phases of research publication
Actions contributing to or resulting in bias
Preliminary and pilot studies
Small studies more likely to be “negative” (e.g., those with discarded or failed hypotheses) remain unpublished; companies classify some as proprietary information
Authors decide that reporting a “negative” study is uninteresting; and do not invest the time and effort required for submission
Authors decide to submit the “negative” report to a nonindexed, non-English, or limited-circulation journal
Editor decides that the “negative” study does not warrant peer review and rejects manuscript
Peer reviewers conclude that the “negative” study does not contribute to the field and recommend rejecting the manuscript. Author gives up or moves to lower impact journal. Publication delayed
Author revision and resubmission
Author of rejected manuscript decides to forgo the submission of the “negative” study or to submit it again at a later time to another journal (see “journal selection,” above).
Journal delays the publication of the “negative” study
Proprietary interests lead to report getting submitted to, and accepted by, different journals
]. RCTs including large numbers of patients are less likely to remain unpublished or ignored and tend to provide more precise estimates of the treatment effect, whether positive or negative (i.e., showing or not showing a statistically significant difference between intervention and control groups). Discrepancies between results of meta-analyses of small studies and subsequent large trials may occur as often as 20% of the time [
]. Furthermore, they may publish in journals with limited readership studies that, by their significance, warrant publication in the highest profile medical journals. They may also succeed in obscuring results using strategies that are scientifically unsound. The following example illustrates all these phenomena.
Salmeterol Multicentre Asthma Research Trial (SMART) was a randomized trial that examined the impact of salmeterol or placebo on a composite outcome of respiratory-related deaths and life-threatening experiences. In September 2002, after a data monitoring committee review of 25,858 randomized patients showed a nearly significant increase in the primary outcome in the salmeterol group, the sponsor, GlaxoSmithKline (GSK), terminated the study. Deviating from the original protocol, GSK submitted to the Food and Drug Administration (FDA) an analysis that included events in the 6 months after trial termination, an analysis that produced a diminution of the dangers associated with salmeterol. The FDA eventually obtained the correct analysis [
In another more recent example, Schering-Plough delayed, for almost 2 years, publication of a study of more than 700 patients that investigated a combination drug, ezetimibe and simvastatin vs. simvastatin alone, for improving lipid profiles and preventing atherosclerosis [
]. The inclination to rate down for publication bias should increase if most of those small studies are industry sponsored or likely to be industry sponsored (or if the investigators share another conflict of interest) [
]. Of the 38 studies viewed as positive by the FDA, 37 were published. Of the 36 studies viewed as negative by the FDA, 14 were published. Publication bias of this magnitude can seriously bias effect estimates.
Additional criteria for suspicion of publication bias include a relatively recent RCT or set of RCTs addressing a novel therapy and systematic review authors’ failure to conduct a comprehensive search (including a search for unpublished studies).
7. Using study results to estimate the likelihood of publication bias
Another criterion for publication bias is the pattern of study results. Suspicion may increase if visual inspection demonstrates an asymmetrical (Fig. 1b ) rather than a symmetrical (Fig. 1a) funnel plot or if statistical tests of asymmetry are positive [
Furthermore, systematic review and guideline authors should bear in mind that even if they find convincing evidence of asymmetry, publication bias is not the only explanation. For instance, if smaller studies suffer from greater study limitations, they may yield biased overestimates of effects. Another explanation would be that, because of a more restrictive (and thus responsive) population, or a more careful administration of the intervention, the effect may actually be larger in the small studies.
A second set of tests, referred to as “trim and fill,” tries to impute missing information and address its impact. Such tests begin by removing small “positive” studies that do not have a “negative” study counterpart. This leaves a symmetric funnel plot that allows calculation of a putative true effect. The investigators then replace the “positive” studies they have removed and add hypothetical studies that mirror these “positive” studies to create a symmetrical funnel plot that retains the new pooled effect estimate [
]. The same alternative explanations to asymmetry that we have noted for funnel plots apply here, and the imputation of new missing studies represents a daring assumption that would leave many uncomfortable.
Another set of tests estimates whether there are differential chances of publication based on the level of statistical significance [
]. These tests are well established in the educational and psychology literature but, probably because of their computational difficulty and complex assumptions, are uncommonly used in the medical sciences.
Finally, a set of tests examines whether evidence changes over time. Recursive cumulative meta-analysis [
] performs a meta-analysis at the end of each year for trials ordered chronologically and notes changes in the summary effect. Continuously diminishing effects strongly suggests time lag bias. Another test examines whether the number of statistically significant results is larger than what would be expected under plausible assumptions [
In summary, each of the approaches to using available data to provide insight into the likelihood of publication bias may be useful but has limitations. Concordant results of using more than one approach may strengthen inferences regarding publication bias.
More compelling than any of these theoretical exercises is authors’ success in obtaining the results of some unpublished studies and demonstrating that the published and unpublished data show different results. In these circumstances, the possibility of publication bias looms large. For instance, a systematic review found that including unpublished studies of the use of quinine for the treatment of leg cramps decreased the estimated effect size by a factor of two [
]. Unfortunately, obtaining the unpublished studies is not easy (although reliance on FDA submissions [or submissions to other regulatory agencies], as demonstrated in a number of examples we cited, can be very effective). On the other hand, reassurance may come from a systematic review that has succeeded in gaining industry cooperation and states that all trials have been revealed [
Prospective registration of all RCTs at inception and before their results become available enables review authors (and those using systematic reviews) to know when relevant trials have been conducted so that they can ask the responsible investigators for the relevant study data [
]. Consequently, searching clinical trial registers is becoming increasingly valuable and should be considered by review authors and those using systematic reviews when assessing the risk of publication bias. There is currently no initiative for registration of observational studies, leaving them, for the foreseeable future, open to publication bias.
8. Publication bias in observational studies
The risk of publication bias is probably larger for observational studies than for RCTs [
], particularly small observational studies and studies conducted on data collected automatically (e.g., in the electronic medical record or in a diabetes registry) or data collected for a previous study. In these instances, it is difficult for the reviewer to know if the observational studies that appear in the literature represent all or a fraction of the studies conducted, and whether the analyses in them represent all or a fraction of those conducted. In these instances, reviewers may consider the risk of publication bias as substantial [
]. All trials, which ranged in size from 40 to 234 patients—with most around 100—were industry sponsored. Furthermore, the funnel plot suggests the possibility of publication bias (Fig. 2).
10. Acknowledging the difficulties in assessing the likelihood of publication bias
Unfortunately, it is very difficult to be confident that publication bias is absent, and almost equally difficult to know where to place the threshold and rate down for its likely presence. Recognizing these challenges, the terms GRADE suggests using in GRADE evidence profiles for publication bias are “undetected” and “strongly suspected.” Acknowledging the uncertainty, GRADE suggests rating down a maximum of one level (rather than two) for suspicion of publication bias. Nevertheless, the examples cited herein suggest that publication bias is likely frequent, particularly in industry-funded studies. This suggests the wisdom of caution in the face of early results, particularly with small sample size and number of events.
in: Guyatt G. Users’ guides to the medical literature: a manual for evidence-based clinical practice. McGraw-Hill,
New York, NY2008
The GRADE system has been developed by the GRADE Working Group. The named authors drafted and revised this article. A complete list of contributors to this series can be found on the journal's Web site at www.elsevier.com.