Journal of Clinical Epidemiology
Volume 62, Issue 1 , Pages 47-53.e3, January 2009

Publication bias was not a good reason to discourage trials with low power

  • George F. Borm

      Affiliations

    • Department of Epidemiology and Biostatistics, Radboud University Nijmegen Medical Centre, the Netherlands
    • Corresponding Author InformationCorresponding author.Radboud University Nijmegen Medical Centre, Department of Epidemiology and Biostatistics 133, P.O. Box 9101, NL-6500 HB Nijmegen, The Netherlands. Tel.: +31-243617667; fax: +31-243613505.
  • ,
  • Martin den Heijer

      Affiliations

    • Department of Epidemiology and Biostatistics, Radboud University Nijmegen Medical Centre, the Netherlands
    • Department of Endocrinology, Radboud University Nijmegen Medical Centre, the Netherlands
  • ,
  • Gerhard A. Zielhuis

      Affiliations

    • Department of Epidemiology and Biostatistics, Radboud University Nijmegen Medical Centre, the Netherlands

Accepted 29 February 2008. published online 14 July 2008.

Article Outline

Abstract 

Objective

The objective was to investigate whether it is justified to discourage trials with less than 80% power. Trials with low power are unlikely to produce conclusive results, but their findings can be used by pooling then in a meta-analysis. However, such an analysis may be biased, because trials with low power are likely to have a nonsignificant result and are less likely to be published than trials with a statistically significant outcome.

Study Design and Setting

We simulated several series of studies with varying degrees of publication bias and then calculated the “real” one-sided type I error and the bias of meta-analyses with a “nominal” error rate (significance level) of 2.5%.

Results

In single trials, in which heterogeneity was set at zero, low, and high, the error rates were 2.3%, 4.7%, and 16.5%, respectively. In multiple trials with 80%–90% power and a publication rate of 90% when the results were nonsignificant, the error rates could be as high as 5.1%. When the power was 50% and the publication rate of non-significant results was 60%, the error rates did not exceed 5.3%, whereas the bias was at most 15% of the difference used in the power calculation.

Conclusion

The impact of publication bias does not warrant the exclusion of trials with 50% power.

Keywords: Meta-analysis, Publication bias, Ethics, Clinical trials, Type I error, Statistical power, heterogeneity

 

Back to Article Outline

1. Introduction 

There are conflicting opinions about whether it is ethically justified to perform trials with low power. Some authors argued that all trials provide useful treatment estimates and confidence intervals, irrespective of their power, especially if the results are pooled in a meta-analysis [1]. Furthermore, the opportunity to perform small trials makes it easier to initiate and complete studies, whereas subsequent meta-analyses provide information similar to that from one large study [1]. Other authors objected to this approach and stipulated that trials should have at least 80% or 90% power [2], [3]. They argued that small studies make meta-analyses unreliable. An important reason for this is publication bias: trials with nonsignificant results are less likely to be published, which might lead to overly optimistic results of the meta-analysis [4], [5]. As publication bias applies especially to trials with low power, it is advocated to only carry out trials if they have sufficient power [2], [3], [6]. For example, ethics committees tend to require that trials have at least 80% power, so small trials are likely to be disapproved. It is unclear whether this minimal power requirement can be reduced to, for example, 50%. This would permit trials that are approximately half the size of trials with 80% power. A minimum of 30% power might even be considered, which would permit trials of approximately a quarter of the size.

Various authors investigated the degree of publication bias [7], [8], [9]. They estimated the percentage of trials that remained unpublished and how this percentage depended on various parameters, such as the power and the outcome of the studies. Dickersin et al. combined all the available papers on the importance of publication bias and identified factors that influenced the chance that a study would be published [7]. They found a strong relationship between the publication of a trial and the statistical significance of its results. Approximately 55% of the published trials had a statistically significant outcome, whereas only 15% of the unpublished trials had a significant result. They estimated that 10%–40% of the trials were never published. This means that, when underreporting is 10%, 97% of the studies with a significant outcome will be published, vs. 83% of the studies with a nonsignificant outcome. When underreporting is 40%, these percentages would be 85% and 44%, respectively. The authors also reported that trial size had an additional, but much weaker effect of on the likelihood of publication [7].

Meta-analyses may have limitations, but the important question is whether these limitations bias the outcome to a significant degree. Several authors compared the results of a single large trial to the results of a meta-analysis on smaller trials that evaluated the same treatment [10], [11], [12], [13], [14], [15], [16], [17]. Comparisons of whether significant results in the meta-analysis corresponded with significant results in the large trial and vice versa showed discrepancies in 10%–25% of the comparisons. However, there was usually directional agreement in the treatment estimates and the confidence intervals overlapped. Furthermore, differences in treatment modalities and patient groups also formed explanations for the discrepancies. Even when there are no explanations for the differences between the results of the large trial and the meta-analysis, it does not necessarily mean that the outcome of the meta-analysis is incorrect. The results of large trials may be challenged and refuted over time, so it would be imprudent to let the outcome of a single large trial overrule the outcome of the meta-analysis [11], [12], [18].

1.1. Research questions 

Publication bias has been shown to occur and to have an influence on the results of meta-analyses, in actual situations and in simulation studies. Song et al., for example, showed that publication bias may have a large effect when only 76 or 155 trials out of 500 very small trials are published [5]. To our knowledge, no information is available about the potential effect of publication bias in less extreme situations, that is, when all the trials have at least 50% power or when they have at least 30% power. We performed a simulation study with the aim of answering the following questions:

To what extent does publication bias lead to overestimation of the difference between the treatments?

How much does publication bias inflate the type I error rate? If a meta-analysis is performed on trials that evaluated nonefficacious treatments and the significance level is set at 5%, then a statistically significant outcome would only be permissible in 5% of the meta-analyses. Due to selective publication, the percentage of statistically significant outcomes, that is, the “true” type I error rate, can be higher than the “nominal” rate of 5% that was used in the statistical test in the meta-analysis.

The results of these simulations provided the basis for a discussion about relaxation of the power requirement for trials, for example, from a minimum requirement of 80% power to a minimum of 50% power.

Statistical details can be found in the Appendix A, Appendix B on the journal's web site at www.elsevier.com.

Back to Article Outline

2. One-sided vs. two-sided error rates 

Usually, two-sided tests are carried out in a meta-analysis. Under the null hypothesis of no difference between the treatments in the studies included in the meta-analysis, the two-sided type I error is the sum of two equal parts: the probability of erroneously concluding that there is a (false) positive difference between the treatments (FP error rate) and the probability of erroneously concluding that there is a (false) negative difference (FN error rate). However, when publication bias affects the type I error, it does so in an asymmetrical way. As it “favors” positive results, it increases the FP error rate, but it may decrease the FN error rate. For example, it is possible that the overall two-sided error rate is 5%, but that the FP error rate is 4%, whereas the FN error rate is 1%. As the FP error rate is more relevant than the FN error rate in the evaluation of the impact of publication bias, we used one-sided error rates and one-sided tests throughout the simulation.

Back to Article Outline

3. Methods 

Our simulation study investigated more than 1,000 combinations of power, number of trials, and publication rates. In each combination, we simulated 6,400 series of trials. At the end of each series, we performed a meta-analysis and calculated the probability of erroneously concluding that there is a positive difference between the treatments (FP error rate) and the bias. Appendix B provides details of the simulation program.

3.1. Power and number of trials 

The power of the simulated trials varied between 30% and 90%. A series could consist of trials with equal power, or a mixture of different powers. In the latter situation, the power of the trials varied between two extremes and followed a beta distribution. The number of trials ranged from 1 to 30.

3.2. Publication bias 

The probabilities that trials with statistically significant (SS) results would lead to publication were set at 85%, 95%, or 100% (SS publication rates). In studies with nonsignificant (NS) results and 80% or 90% power, the probability of publication was varied between 80% and 100% (NS publication rates). In studies with 30% or 50% power, NS publication rates were varied between 50% and 70%. When the series of trials contained a mixture of different powers, the trials had NS and SS publication rates that were proportional to their power.

3.3. Heterogeneity 

Series of trials were generated without heterogeneity as well as with low, moderate, and high heterogeneity. This was achieved by setting the percentage of total variance that was due to heterogeneity (I2) at 0, 0.25, 0.5, and 0.75 (Appendix A1, 2). We then used the random effects method developed by Sidik and Jonkman to carry out the meta-analyses (Appendix A3). As a validation of this method, we investigated whether the FP error rate was equal to the nominal one-sided type I error rate when there was no publication bias.

3.4. Assumed treatment difference in the power calculation 

Each trial made comparisons of the means between two equal sized treatment groups, whereas the power was based on a treatment difference of 0.2 standard deviations. The choice of 0.2 standard deviations was fairly arbitrary, but realistic. For example, a trial with equal sized groups and 90% power requires 525 patients per group, whereas a trial with 30% power requires 105 patients per group. Although other differences would lead to larger or smaller trials, the bias and type I error rates of the meta-analyses would remain the same (Appendix A4–6).

3.5. FP error rate and bias 

After simulating the results of all the trials in a series and determining which trials would be published, a meta-analysis was carried out on these published trials with a one-sided error rate of 0.025. We then evaluated the FP error rate and the bias.

To evaluate the FP error rate, the true treatment difference δ was set at zero (Appendix A5). In this case, the bias was simply the average of the results of all the meta-analyses (bias under the null hypothesis H0, Appendix A6).

To evaluate the bias associated with an efficacious treatment, we set δ at 0.2 standard deviations and calculated the (positive) bias by subtracting δ from the average outcome of the series (bias under the alternative hypothesis H1, Appendix A6). In addition, we evaluated the bias when the power calculation was based on an overoptimistic estimate of the efficacy. In this case, the true difference δ between the treatments was set at 0.05 or 0.1 standard deviation, whereas the difference used in the power calculation remained 0.2 standard deviations.

We standardized the bias by expressing it as a percentage of the difference used in the power calculation (Appendix A6).

Back to Article Outline

4. Results 

Table 1, Table 2 present the FP error rates and the bias of the meta-analyses on the published trials. FP error rates that exceeded 3.5% are shown in italics, whereas error rates that exceeded 5% are shown in bold.

Table 1. FP error rates and bias in series of trials with 80% or 90% power
PowerI2Number of trialsFP error rateBias under H0Bias under H1
SS and NS publication ratesSS and NS publication ratesSS and NS publication rates
100-100100-9095-80100-9095-80100-9095-80
90%012.32.62.80001
22.72.72.60101
0.2514.75.25.60112
22.52.93.20112
0.5018.59.310.01123
22.43.84.91123
0.75116.518.019.02345
22.65.17.92335
80%022.82.82.80112
32.42.32.40001
0.2522.52.83.21122
32.32.32.60112
0.5022.53.64.81124
32.32.53.21112
0.7522.64.77.52346
32.32.84.02334

FP error rates that exceeded 3.5% are shown in italics, whereas error rates that exceeded 5% are shown in bold.

Table 2. FP error rates and bias in series of trials with 30% or 50% power
PowerI2Number of trialsFP error rateBias under H0Bias under H1
SS and NS publication ratesSS and NS publication ratesSS and NS publication rates
100-10085-7085-6085-5085-7085-6085-5085-7085-6085-50
50%052.72.62,72.91234711
102.52.82.72.81224711
202.62.82.92.71224711
0.2552.62.62.93.31345813
102.62.82.72.91245812
202.73.03.03.61345812
0.5052.32.43.24.825871116
102.52.72.93.524761015
202.33.23.54.325761015
0.7552.53.24.36.251016101523
102.33.33.95.1591681421
202.53.85.37.05101681421
30%052.92.92.63.01235814
102.62.72.62.61235814
202.52.93.13.11235914
302.52.82.93.11235914
0.2552.62.82.83.323571117
102.72.82.92.923561017
202.72.93.13.42456117
302.52.63.23.923561117
0.5052.52.53.14.5361091422
102.73.23.33.5361081321
202.63.33.84.3461071321
302.53.14.15.3361071321
0.7552.53.14.36.071421132033
102.73.54.15.271322111930
202.54.25.16.581422101930
302.64.25.98.671322101929

FP error rates that exceeded 3.5% are shown in italics, whereas error rates that exceeded 5% are shown in bold.

4.1. Performance of the Sidik and Jonkman method 

When the NS and SS publication rates were 100%, the FP error rate of the Sidik and Jonkman method was approximately 2.5%, even when the number of trials was low. These results are presented in the columns “FP error rate/SS and NS publication rates 100-100.” The Sidik and Jonkman approach seemed appropriate in our study. (Error rates in the rows that correspond with a single trial are too high, but they are irrelevant to the evaluation of the performance of the Sidik and Jonkman method.)

4.2. Trials with 80% or 90% power (Table 1

In a single trial, low heterogeneity of I2=0.25, led to an FP error rate of 4.7%, even in the absence of publication bias. When I2=0.75, the FP error rate was 16.5%.

In series of two trials with 90% power, 0% heterogeneity and NS publication rates of 80% and 90%, FP error rates were 2.7% and 2.6%, respectively. When heterogeneity was high, these rates were 7.9% and 5.1%, respectively. In two trials with a power of 80%, the error rates were similar, but in series of three trials with a power of 80%, they were lower: When there was no heterogeneity, the rates did not exceed 2.5%. High heterogeneity led to FP error rates of 4.0% when the NS publication rate was 80% and to 2.8% when was 90%.

In trials with a power of 80% or 90%, the bias was less than 7% of the difference used in the power calculation.

4.3. Trials with 50% power (Table 2

When the SS publication rate was 85% and the NS publication rates were 60% or 70%, the FP error rates varied from 2.6% to 2.9% with no heterogeneity and from 3.2% to 5.3% with high heterogeneity. When the NS publication rate was 50%, these error rates were between 2.7% and 2.9% in the absence of heterogeneity and between 5.1% and 7.0% when heterogeneity was high.

NS publication rates of 50%, 60% and 70% led to maximal bias of 23%, 15%, and 10%, respectively.

4.4. Trials with 30% power (Table 2

An SS publication rate of 85% and NS publication rates of 60% or 70% produced FP error rates that varied from 2.6% to 3.1% with no heterogeneity and from 3.1% to 5.9% with high heterogeneity. When the NS publication rate was 50%, these error rates were 2.7%–2.9% and 5.2%–6.8%, respectively.

The maximal bias when NS publication rates were set at 50%, 60%, and 70% were 33%, 20%, and 13%, respectively.

4.5. Series of trials with overestimated efficacy or series of trials with different powers 

The results were similar when the true efficacy was lower than the estimate used in the power calculations (data not shown).

When the trials in a series had different powers, the error rates lay between those obtained from series that all had the same maximum/minimum power (data not shown).

Back to Article Outline

5. Discussion and conclusion 

In our simulation study, we investigated the impact of publication bias on the results of meta-analyses, when the trials had at least 50% power and when they had at least 30% power. Meta-analyses on the trials with at least 50% power had similar results to those of trials with 80% or 90% power. Publication bias had a slightly stronger impact on the results of meta-analyses on trials with 30% power, but the validity of the meta-analyses was still satisfactory. We based this conclusion on the following arguments.

5.1. Estimates of the degree of publication bias 

In the literature, the lowest estimated publication rates were 85% when the outcome was significant and 44% when the outcome was nonsignificant. The highest estimates were 97% and 83%, respectively (see Section 1). In our simulations, the levels of publication bias were within the same range.

Although we presented the results of an NS publication rate of 50%, we believe that in current trial practice, this rate is no longer realistic when studies have a power of 30% or more. A trial with 30% power is still relatively large, which increases its chance of publication, especially in view of the increasing pressure to publish. Also, the trial registration procedure that is currently required for publication in major journals and the availability of the Internet make it easier to find published and unpublished studies [19].

We therefore restrict the discussion below to situations in which the NS publication rate is at least 90% when trials have 80% or 90% power and at least 60% when trials have 30% or 50% power. Figure 1 presents the main findings and Table 1, Table 2 show the details. The number of trials in a series and their power are plotted on the horizontal axis of Fig. 1: a single trial with 90% power, two trials with 90% power, etc. In each series, the vertical axis shows the FP error rate at various levels of publication bias. Lines connect the results from series with the same publication bias.

  • View full-size image.
  • Fig. 1 

    FP error rates in single trials and series of trials with low heterogeneity (bold lines) or high heterogeneity (thin lines). SS and NS publication rates were 100%/100% (), 100%/90% (), 85%/70% () and 85%/60% ().

5.2. Trials with 50% power compared to trials with 80% or 90% power: the FP error rate 

In a single trial, heterogeneity led to a strong increase in the FP error rates (Table1, Appendix A7). Even in the absence of publication bias, the error rates ranged from 4.7% when heterogeneity was low to 16.5% when heterogeneity was high.

When a series contained more trials, inflation of the error rates was less dramatic. In trials with at least 80% power, the FP rates were 2.8%, 2.9%, 3.8%, and 5.1% or less when I2 was 0, 0.25, 0.5 and 0.75, respectively. When the power was 50%, the error rates were 2.9%, 3.0%, 3.5%, and 5.3%, respectively. Although the error rates were increased, the level of increase was only marginally higher than that in the trials with at least 80% power. Therefore, the error rate does not offer any convincing reason to reject trials with 50% power.

5.3. Trials with 50% power vs. trials with 80% or 90% power: bias 

When the power of the trials was at least 50% and the NS publication rate was at least 60%, the bias never exceeded 15% of the difference used for the power calculation. The difference used in the power calculation usually corresponds with the margin of clinical relevance. Therefore, a maximum bias of 15% does not seem important.

To put the bias further into perspective, we compare it to the precision of a trial. When the power of a trial is 80%, the distance between the lower (or upper) limit of the 95% confidence interval and the point estimate at the center is 71% of the difference on which the power is based (Appendix A8). When the power is 90%, the distance is 60%. Clearly, even when the bias is 15%, the uncertainty about the treatment difference exceeds the bias by approximately a factor four.

Heterogeneity also helps to put the bias into perspective. When heterogeneity is present, the “true” treatment effects δi of the individual trials vary around an overall “true” effect δ (Appendix A1). The difference between the overall effect δ and the 97.5 percentiles of the distribution of the δi is a measure of the uncertainty about the expected treatment results. When I2=0.25, I2=0.50, and I2=0.75, these differences are 42%, 72%, and 124% of the “true” treatment effect that corresponds with 80% power, respectively (Appendix A4, equation 14). In trials with 90% power, the differences are 36%, 62%, and 106% of the “true” treatment effect, respectively. When we compare these differences to the maximal bias of 8%, 11%, and 15%, the possible bias seems minor against the uncertainty about the treatment effect.

In our opinion, the bias is irrelevant in comparison with the relevant clinical difference and the uncertainties about the “true” treatment effect. Bias forms insufficient reason to reject trials with 50% power.

5.4. Trials with 30% power 

When heterogeneity was low, the FP error rates in the series of trials with 30% power were up to 3.2%, whereas high heterogeneity led to rates of up to 5.9%, which was only slightly higher than in trials with 80% or 90% power. The bias did not exceed 22% of the difference used in the power calculations.

5.5. Large numbers of trials 

Figure 1 suggests that the FP error rate in series of trials with 30% or 50% power increased as the number of trials in the series increased. This was indeed the case. For example, when heterogeneity was low and the NS publication rates were 60% and 70%, the FP error rates in a series of 50 trials with 50% power were 3.7% and 3.1%, respectively. However, trials with 50% power still require half the number of patients that would be required for trials with 80% power, so it is unlikely that very large numbers of 50% power trials will be performed. In addition, 50 trials with 50% power correspond with 25 trials with 80% power. Such a large number of trials are unnecessary and ethically unjustifiable. No new trials should be started when trial registration databases indicate that a sufficient number already have been performed or are ongoing.

When trials are required to have at least 50% power, the number of trials will remain limited, so error rates will not be excessive.

5.6. Heterogeneity 

Heterogeneity led to increased FP error rates in single trials with high power, even when the publication rate was 100% (Table 1, Appendix A7). Therefore, heterogeneity may be a reason to carry out several small trials, rather than a single large one, as small trials offer the opportunity to estimate the level of heterogeneity and this provides an indication of the generalizability of the trial results [20]. When heterogeneity is present, the results of the treatment will vary more than would be suggested by a single trial, so it may be premature to draw conclusions based on a single trial [18], [20].

The role of heterogeneity may be fairly important, as Higgins et al. found I2>0 in 50% of meta-analyses [21]. In addition, heterogeneity was moderate or high in approximately a quarter of the meta-analyses.

5.7. Quality of the trials 

The observation that smaller trials are sometimes of poorer quality is an important reason to discourage them [1]. However, the quality of trials with 50% power will not be much lower than that of trials with 80% power, as the latter are only twice as large. Even a trial with 30% power is still a quarter of the size of a trial with 80% power. In addition, the problem of low quality trials may become less serious in the future due to increasing regulations and codes for proper procedures and trial conduct (GCP, CONSORT, trial registration). We agree with Schulz et al. that measures to improve quality are more worthwhile than putting emphasis on high power [1].

5.8. Simulation studies 

Simulation studies are a simplification of reality. For example, we focused on publication bias, but there are more factors that have an impact on the outcome of a meta-analysis, for example, the quality of the trials. Despite this simplification, we consider our results to be reliable and valid for trials in general, but we do not exclude that certain special cases may require separate investigation.

5.9. Conclusion 

The question is not whether publication bias exists, but what is its likely impact? Our results showed that the results of meta-analyses on trials with 50% power were in broad agreement with the “true” difference, even in the presence of substantial publication bias. In trials with 30% power, this was also the case, but possibly to a somewhat lesser extent. Our results agreed with earlier findings that the outcomes of meta-analyses on small trials were generally similar to those from large trials with the same focus (Section 1).

Ethics committees should therefore not reject trials with only 50% power. Instead, they should judge the appropriateness of a planned study on the basis of its quality and the available evidence from published and ongoing studies. This will prevent unnecessary investment of patients and resources.

Back to Article Outline

Appendix A. Statistical details 

1. The random effects model for meta-analysis 

For N studies, let the random variable yi be the effect size estimate from the ith study. The random effect model can then be defined as follows:

(1)
where δi = δ + di, ei and di independent, and . The parameter τ represents the heterogeneity.

2. Estimates of the heterogeneity 

Let wi be the inverse of the sample variance of yi and . The DerSimonian and Laird estimate of the variance τ2 is

(2)
where Q is the heterogeneity statistic [1]
(3)

Although or Q can be used as measures of the heterogeneity, Higgins and Thompson propose

(4)

I2 has the advantage that it can be interpreted as the percentage of total variance that is due to heterogeneity [2]: For equal σi (σi = σ),

(5)
where
(6)
is an estimator of the within-study variance of the effect size.

3. The analysis method of Sidik and Jonkman 

The commonly used method for meta-analysis is the random effects approach according to DerSimonian and Laird [1]. However, as this approach is known to have poor performance when the number of trials is low, we used the method developed by Sidik and Jonkman instead [3]. The main difference between the methods is how they estimate the variances. Although the former method is based on the DerSimonian and Laird estimator (equation 2), the Sidik and Jonkman method estimates the variance by:

(7)

In addition, the DerSimonian and Laird method uses a standard normal critical value, whereas the Sidik and Jonkman method uses a critical value from the t-distribution.

4. The results of the simulations only depended on the heterogeneity and the power of the trials 

Based on the relationship between the heterogeneity estimators I2 and (equation 5), we defined the heterogeneity parameter as

(8)

In the trials with a normally distributed outcome variable, a standard deviation of σx, groups of equal size and an assumed effect size of δπ, the sample size required for power π was approximately [4]

(9)

In trials with equal groups, and if we define

(10)
it follows that
(11)

Then, by equation 1:

(12)
where and
(13)
is the optimism of the power calculation.

By equations (8), (11),

(14)
so (equation 12)
(15)
with (equation 14)
(16)

Equations (15), (16) show that the distribution of yi/δπ only depended on the power and the heterogeneity. As the power depended on the standard deviation, treatment difference, and size of the trial (equation 11), varying the power and the heterogeneity in the simulations also covered variations in effect size, trial size, and standard deviation.

5. The error rate of the meta-analysis 

We investigated the “true” error rates of the hypothesis H0: δ = 0 (Appendix A1). As a statistical test on the outcome yi is equivalent to a test on yi/δπ, equations (15), (16) show that the error rates found in the simulations did not depend on the choice of the effect sizes used in the power calculation, the trial sizes, or the standard deviations. Only the power, the heterogeneity, and the optimism of the power calculation mattered.

6. The bias of the meta-analysis 

The bias was defined as

(17)

In the simulations, we evaluated two distinct situations: δ = 0 (bias under H0) and δ = 0.05σx, 0.1σx, or 0.2σx (bias under H1). In the former situation, the formula for bias reduced to .

By expressing the bias as a percentage of the effect size used in the power calculation

(18)
the bias becomes independent of the effect size used in the power calculation, the trial size, or the standard deviation (equations (15), (16)).

7. A single study 

In the article, we did not differentiate between the observed heterogeneity as estimated by (or I2). and the heterogeneity parameter (or I2). In the simulations, the difference mattered, because) we generated trials with a certain heterogeneity parameter and then estimated the heterogeneity as a step in the meta-analysis.

However, when only one trial is available, the heterogeneity cannot be estimated. This does not mean that there is no heterogeneity (), but estimating it requires at least two trials. The consequence is that the overall null-hypothesis δ = 0 cannot be tested. The analysis of a single study only tests the null hypothesis δi = 0.

This shows the limitations of a single trial, because we are primarily interested in the overall null hypothesis δ = 0.

8. The relationship between the expected precision and the power of a trial 

The expected distance between the point estimate of the effect size and the upper or lower limit of the two-sided confidence interval is (Appendix A1).

By equations (11), (10), we obtain

(19)

So

(20)

Back to Article Outline

Appendix B. The simulation program 

1. The simulation parameters 


-The number of iterations Niter, that is, the number of trial series that were generated for each combination of the parameters below.

-The number of trials N in each series.

-The overall effect size δ, the heterogeneity I2 and the standard deviation σx of the outcome variable of the trials (Appendix A1–4).

-The effect size δπ used in the power calculation.

-The one-sided significance level α used to analyze each trial.

-The one-sided significance level αm used in the meta-analysis.

-The power π when all the trials had equal power.

-When the trials had various different powers: the minimum power π, the maximum power π+ and the parameters aπ and bπ of the beta distribution that was used to generate the power of each individual trial.

-The SS and NS publication rates when these were fixed.

-When the SS and NS publication rates depended on the power: the minimum and maximum SS and NS publication rates SS, SS+, NS, and NS+.

2. The simulation 

1 Generate N trials, in each trial i:

a.If the power is not fixed, generate the power π = π + (π+πi) B (aπ, bπ), where B(aπ, bπ) is a random draw from the beta distribution with parameters aπ and bπ.

b.Determine the sample size (Appendix A, equation 9).

c.Use equation 8 and the fact that to calculate the variance τ2.

d.Generate the “true” trial effect size δi = N (δ, τ2), where N (δ, τ2) is a random draw from the normal distribution with mean δ and variance τ2.

e.Generate the trial result , a random draw from the normal distribution with mean δi and variance .

f.Generate the variance of the trial result: , where C2 (2n−2) is a random draw from the χ2 distribution with 2n−2 degrees of freedom.

g.Perform a one-sided t-test on the trial result: the trial is significant if .

h.If the NS and SS rates depend on the power of the trial, determine the SS and NS rates: SS= SS+ (SS±−SS)(π+−π)/(π+−π) and NS= NS+ (NS±−NS)(π+−π)/(π+−π).

i.If the trial is significant, determine whether the trial will be “published”: publication=Bernouilly(SS), a random draw from the Bernouilly distribution with parameter SS. If the trial is not significant: publication=Bernouilly(NS).

2. Carry out a meta-analysis on the “published” trials, with a one-sided significance level αm

The above steps 1 and 2 were iterated Niter times. Then the percentage ssmeta of meta-analyses that were statistically significant and the mean dmeta of the results of the meta-analyses were calculated. The mean dmeta was standardized with respect to the difference used in the power calculations: drel=dmeta/δπ.

When δ = 0 (i.e., under the null hypothesis), ssmeta is the FP error rate and drel is the bias (Appendix A5, 6).

When δ > 0 (the alternative hypothesis), drel−δ/δπ is the bias (Appendix A6).

The simulations were programmed in SAS, version 8.2.References

[1]Higgins JP, Thompson SG. Quantifying heterogeneity in meta-analysis. Stat Med 2002;21:1539–58.

[2]DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986;7:177–88.

[3]Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Stat Med 2002;21:3153–9.

[4]Armitage P, Berry G. Statistical methods in medical research. third edition. Oxford: Blackwell Science; 1994.

Back to Article Outline

References 

  1. Schulz KF, Grimes DA. Sample size calculations in randomised trials: mandatory and mystical. Lancet. 2005;365(9467):1348–1353
  2. Janosky JE. The ethics of underpowered clinical trials. JAMA. 2002;288:2118
  3. Halpern SD, Karlawish JH, Berlin JA. The continuing unethical conduct of underpowered clinical trials. JAMA. 2002;288:358–362
  4. In:  Rothstein HR,  Sutton AJ,  Borenstein M editor. Publication bias in meta-analysis. Chichester: Wiley; 2005;
  5. Song GF, Eastwood AJ, Gilbody S, Duley L, Sutton AJ. Publication and related biases. Health Technol Assess. 2000;4:1–106
  6. Egger M, Smith GD, Sterne JA. Uses and abuses of meta-analysis. Clin Med. 2001;1:478–484
  7. Dickersin K. How important is publication bias? A synthesis of available data. AIDS Educ Prev. 1997;9(1 Suppl):15–21
  8. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ. 1997;315(7109):640–645
  9. Thornton A, Lee P. Publication bias in meta-analysis: its causes and consequences. J Clin Epidemiol. 2000;53:207–216
  10. Contopoulos-Ioannidis DG, Gilbody SM, Trikalinos TA, Churchill R, Wahlbeck K, Ioannidis JP. Comparison of large versus smaller randomized trials for mental health-related interventions. Am J Psychiatry. 2005;162:578–584
  11. Ioannidis JP, Cappelleri JC, Lau J. Issues in comparisons between meta-analyses and large trials. JAMA. 1998;279:1089–1093
  12. Ioannidis JP, Cappelleri JC, Lau J. Meta-analyses and large randomized, controlled trials. N Engl J Med. 1998;338:59–2
  13. LeLorier J, Gregoire G. Comparing results from meta-analyses vs large trials. JAMA. 1998;280:518–519
  14. LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med. 1997;337:536–542
  15. Cappelleri JC, Ioannidis JP, Schmid CH, de Ferranti SD, Aubert M, Chalmers TC, et al. Large trials vs meta-analysis of smaller trials: how do their results compare?. JAMA. 1996;276:1332–1338
  16. Villar J, Carroli G, Belizan JM. Predictive ability of meta-analyses of randomised controlled trials. Lancet. 1995;345(8952):772–776
  17. Villar J, Piaggio G, Carroli G, Donner A. Factors affecting the comparability of meta-analyses and largest trials results in perinatology. J Clin Epidemiol. 1997;50:997–1002
  18. Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294:218–228
  19. Zarin DA, Tse T, Ide NC. Trial Registration at ClinicalTrials.gov between May and October 2005. N Engl J Med. 2005;(353):2779–2787
  20. Shrier I, Platt RW, Steele RJ. Mega-trials vs. meta-analysis: precision vs. heterogeneity?. Contemp Clin Trials. 2007;28:324–328
  21. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–560

PII: S0895-4356(08)00087-5

doi:10.1016/j.jclinepi.2008.02.017

Journal of Clinical Epidemiology
Volume 62, Issue 1 , Pages 47-53.e3, January 2009