Abstract
Objectives
Study design and setting
Results
Conclusion
Key words
- •The multi-step approach that can be used to manage IPD for analysis from multiple studies involves the following stages:
- •Processing
- •Replication
- •Imputation
- •Merging
- •Evaluation
Key findings
- •PRIME-IPD provides a formalized step-by-step approach to verify and prepare individual participant data from multiple studies for meta-analysis, thus adding to available guidance on evidence synthesis.
What this adds to what is known?
- •The synthesis of IPD from multiple trials provides a powerful approach to control for confounding and investigate effect modification at the individual level. However, a principled and systematic way to build the analytic dataset with requisite checks for data quality, is needed to ensure these benefits are realized.
- •Further testing of this framework to assess feasibility and applicability to other reviews may refine this model.
What is the implication and what should change now?
1. Introduction
- Burns P.B.
- Rohrich R.J.
- Chung K.C.
- Stewart L.A.
- Clarke M.
- Rovers M.
- Riley R.D.
- Simmonds M.
- Stewart G.
- et al.
- Stewart L.A.
- Clarke M.
- Rovers M.
- Riley R.D.
- Simmonds M.
- Stewart G.
- et al.
- Levis B.
- Benedetti A.
- Levis A.W.
- Ioannidis J.P.A.
- Shrier I.
- Cuijpers P.
- et al.
- Vale C.L.
- Rydzewska L.H.
- Rovers M.M.
- Emberson J.R.
- Gueyffier F.
- Stewart L.A.
- Stewart L.A.
- Tierney J.F.
- Polanin J.R.
- Williams R.T.
- Wallis J.C.
- Rolando E.
- Borgman C.L
- Polanin J.R.
- Williams R.T.
- Murugiah K.
- Ritchie J.D.
- Desai N.R.
- Ross J.S.
- Krumholz H.M.
- Nevitt S.J.
- Marson A.G.
- Davie B.
- Reynolds S.
- Williams L.
- Smith C.T
- Polanin J.R.
- Williams R.T.
- Clarke M.J.
- Abo-Zaid G.
- Sauerbrei W.
- Riley R.D.
- Tudur Smith C.
- Nevitt S.
- Appelbe D.
- Appleton R.
- Dixon P.
- Harrison J.
- et al.
- Debray T.P.
- Moons K.G.
- van Valkenhoef G.
- Efthimiou O.
- Hummel N.
- Groenwold R.H.
- et al.
Cochrane Methods Comparing Multiple Interventions: The Cochrane Collaboration. 2021 Available from: July 16, 2020, https://methods.cochrane.org/cmi/.
Cochrane Methods IPD Meta-analysis Group: The Cochrane Collaboration. 2021 Available from: July 16, 2020, https://methods.cochrane.org/ipdma/.
2. Methods
Cochrane Methods Comparing Multiple Interventions: The Cochrane Collaboration. 2021 Available from: July 16, 2020, https://methods.cochrane.org/cmi/.
Cochrane Methods IPD Meta-analysis Group: The Cochrane Collaboration. 2021 Available from: July 16, 2020, https://methods.cochrane.org/ipdma/.
3. Results
- 1.Processing of the datasets
- 2.Replication of published data tables
- 3.Imputation of missing data
- 4.Merging datasets
- 5.Evaluation of data heterogeneity
PRIME: | Items |
---|---|
Processing | • Convert data into a single format for statistical program of choice • Compare the total number of participants in the acquired datasets to those reported in published studies • Verify the presence of the variables of interest in the acquired dataset • Standardize variable names across datasets • Identify and standardize the measurement scales used to report the variables of interest • Identify and standardize coding for missing values • Identify and correct any implausible values that may result from data conversion |
Replication | • Recalculate reported descriptive and summary statistics using the acquired datasets • Calculate the standardized difference to quantitatively assess the difference between the replicated and published results • If the standardized difference is > 10%, investigate and address potential causes |
Imputation | • Assess the appropriateness of conducting imputation of missing data using missing data theory • If multiple imputation is conducted, carefully consider the number of imputations to be run |
Merging | • Ensure in processing step that variable order and codes are correct • Merge the imputed datasets into a single, pooled dataset, taking into consideration the number of imputed datasets, if appropriate |
Evaluation | • Assess continuous variables for normality by residual analysis either visually or by statistical tests • If required, calculate new variables for standardized comparison of effects |
3.1 Processing of datasets
- 1.Convert each acquired dataset to a preferred standardized format (e.g., SAS, STATA). The format should be chosen based on facilitating easy data manipulation. This format may or may not be the format used for the eventual analyses.
- 2.Compare the total number of observations in the received datasets to those reported in the published studies (or global trials registers if publications are not available). In the event of a mismatch, determine the cause of the discrepancies. In event of mismatch, contact the authors to understand reason for discrepancy.
- 3.Verify that the variables of interest are available in the acquired datasets by referring to accompanying data dictionaries. In their absence, contact the primary authors of the studies for the information required.
- 4.Create a master list of individual dataset variable names mapped to the variable name of choice. Rename all variables of interest across the datasets to have common variable names.
- 5.For continuous variables, identify the variables’ scales of measurement and identify any datasets that may need to have values converted to the preferred standard using appropriate conversion formula(e). Determine whether the categories of the categorical variables need to be regrouped or separated into dummy variables.
- 6.Identify any missing values in the datasets and how they are identified in the dataset (e.g., blank cells, symbols). Confirm that the blank cells are missing values and not due to a conversion error by comparing the percentage of missing values per variable in the acquired and converted datasets and standardize across datasets. Similar considerations may exist for data considered not applicable.
3.2 Replication of published data tables
- Makel M.C.
- Plucker J.A.
- Hegarty B.
- Simons D.J.
- Nosek B.A.
- Errington T.M.
- 1.Calculate and compare the descriptive statistics from the processed datasets to the published results. For example, the percentage of females enrolled, age of participants, and pre-existing health conditions.
- 2.Calculate and compare baseline and endline summary statistics for the outcomes of interest from the processed datasets to the published results using the same analytic methods reported in the published article.
- 3.Calculate the standardized difference between the descriptive and summary statistics of the published studies and the replicated results. We referred to the absolute standardized difference criterion of 10% proposed to assess baseline imbalance to assess the magnitude of difference between replicated and published results. We chose the criterion of 10% as an indicator of discrepancy between published and replicated results based on previously proposed thresholds [27,
- Austin P.C.
Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research.Communications in Statistics - Simulation and Computation. 2009; 38: 1228-1234https://doi.org/10.1080/0361091090285957428,- Austin P.C.
Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.J Thorac Cardiovasc Surg. 2007; 134 (PubMed PMID: 17976439): 1128-1135https://doi.org/10.1016/j.jtcvs.2007.07.02129,30]. The standardized difference can be calculated as follows:
3.3 Imputation of missing data
- Dong Y.
- Peng C.Y.
- Sterne J.A.
- White I.R.
- Carlin J.B.
- Spratt M.
- Royston P.
- Kenward M.G.
- et al.
- Schafer J.L.
- Olsen M.K.
- Sterne J.A.
- White I.R.
- Carlin J.B.
- Spratt M.
- Royston P.
- Kenward M.G.
- et al.
3.4 Merging datasets
3.5 Evaluation of data heterogeneity
- 1.Test data distribution by residual analysis for continuous variables either visually, by preparing bar charts, or by parametric statistical tests [[36]]. Comparisons can be implemented between study arms to appraise the randomization of participants in each group and identify differences between study groups.
- Ghasemi A.
- Zahediasl S.
Normality tests for statistical analysis: a guide for non-statisticians.Int J Endocrinol Metab. 2012; 10 (Epub 2012/04/20PubMed PMID: 23843808; PubMed Central PMCID: PMCPMC3693611): 486-489https://doi.org/10.5812/ijem.3505 - 2.Create new variables needed for analysis (e.g., “dummy variables” for categorical variables). This step is needed if there are any variables which need to be calculated based on existing variables in the merged dataset (e.g., body mass index may be calculated using existing data on the height, weight, age and sex of participants).
4. PRIME application
PRIME: | Problem | Application |
---|---|---|
Processing | Incomplete and missing data dictionaries | We used a list of analysis variables to request data since it identified which variables we needed and reviewed the dataset files along with the data dictionaries. Correspondence with authors was helpful in preparing datasets lacking dictionaries. |
Identification of missing variables of interest | We documented the choice of outcome measures for studies that collected data at multiple time points and identified four out of 11 studies which did not report the primary outcomes of interest in their published manuscripts, but they did collect this data and provided it in their dataset. | |
Use of different measurement methods | We evaluated the measurements for helminth egg counts were provided. We identified various measurement methods used between authors in terms of the number of egg samples taken and how they were collected. We selected the most common method and standardized it across all included studies. | |
Identification of conversion errors | We identified the presence of implausible values that required conversion before analysis such as zeros coded 0.99 and 9999. | |
Replication | Inexact number of participants in the datasets compared to reported | The authors provided full datasets, including children who were excluded from the analysis due to missing baseline measures (e.g., missing stool samples). Replication allowed us to verify that these children were excluded from the analyses in the published papers. |
Incorrect treatment labels | By means of replication, we found that the labels in the dataset from authors did not match the labels in the published paper. Correspondence with the authors allowed us to correct these labels and replicate the analyses | |
Uncorrected variables in the provided datasets | Hemoglobin concentration need to be corrected if measured in individuals living in areas 1000 m above sea level, since lower oxygen levels, result in higher hemoglobin concentrations in the blood. Hemoglobin was not adjusted for in two studies’ datasets which were carried out in areas 1000 m above sea level, so the Hemoglobin concentration values obtained when replicating were larger than the reported [37] . | |
Imputation | Studies with missing data | For each study included in the IPD analysis, we calculated the percentage of missing data for each variable of interest. Consequently, we assessed the distribution of the missing variables to assess if imputation was appropriate. We imputed the eligible studies that had less than 50% of missing data and assumed data were missing at random, creating five imputed datasets per study. We used complete case analysis for studies with more than 50% of missing data as part of sensitivity analyses only. |
Merging | Correctly combining multiple datasets | A separate variable was created to identify each observation's original study and imputation number (ranging from one to five). We sorted datasets by that identifier and used MERGE used the command in SAS (9.4) to combine the imputed datasets into a new dataset. |
Evaluation | New variable calculation | Growth standards have varied over the years. We used WHO anthropometric software to calculate BMI for age, weight for age and other growth standards in relatively older studies to combine with the other studies. The Anthropometric calculator in the software also operates similar SAS by tagging implausible weight and height values. |
5. Discussion
- Stewart L.A.
- Tierney J.F.
- Debray T.P.
- Moons K.G.
- van Valkenhoef G.
- Efthimiou O.
- Hummel N.
- Groenwold R.H.
- et al.
- Tierney J.F.
- Vale C.
- Riley R.
- Smith C.T.
- Stewart L.
- Clarke M.
- et al.
- Debray T.P.
- Moons K.G.
- van Valkenhoef G.
- Efthimiou O.
- Hummel N.
- Groenwold R.H.
- et al.
- Naudet F.
- Sakarovitch C.
- Janiaud P.
- Cristea I.
- Fanelli D.
- Moher D.
- et al.
- Cohen B.
- Vawdrey D.K.
- Liu J.
- Caplan D.
- Furuya E.Y.
- Mis F.W.
- et al.
- Lee C.H.
- Yoon H.J.
- Riley R.D.
- Ensor J.
- Snell K.I.
- Debray T.P.
- Altman D.G.
- Moons K.G.
- et al.
- Polanin J.R.
- Williams R.T.
- Polanin J.R.
- Williams R.T.
- Polanin J.R.
- Williams R.T.
Vivli [cited 2020 02/06]. 2021 Available from: July 16, 2020, https://vivli.org/about/overview-2/.
OpenTrials [cited 2020 02/06]. 2021 Available from: July 16, 2020, https://opentrials.net/.
- Vickers A.J.
- Mello M.M.
- Francer J.K.
- Wilenzick M.
- Teden P.
- Bierer B.E.
- Barnes M.
- Ohmann C.
- Banzi R.
- Canham S.
- Battaglia S.
- Matei M.
- Ariyo C.
- et al.
- Banzi R.
- Canham S.
- Kuchinke W.
- Krleza-Jeric K.
- Demotes-Mainard J.
- Ohmann C.
6. Conclusion
Acknowledgments
Author statement
Funding
Appendix. Supplementary materials
References
- The levels of evidence and their role in evidence-based medicine.Plast Reconstr Surg. 2011; 128 (PubMed PMID: 21701348; PubMed Central PMCID: PMCPMC3124652): 305-310https://doi.org/10.1097/PRS.0b013e318219c171
- Preferred reporting items for systematic review and meta-analyses of individual participant data: the PRISMA-IPD statement.JAMA. 2015; 313 (Epub 2015/04/29PubMed PMID: 25919529): 1657-1665https://doi.org/10.1001/jama.2015.3656
- Meta-analysis of individual participant data: rationale, conduct, and reporting.Bmj. 2010; 340 (Epub 2010/02/09PubMed PMID: 20139215): c221https://doi.org/10.1136/bmj.c221
- Selective cutoff reporting in studies of diagnostic test accuracy: a comparison of conventional and individual-patient-data meta-analyses of the patient health questionnaire-9 depression screening tool.Am J Epidemiol. 2017; 185 (PubMed PMID: 28419203; PubMed Central PMCID: PMCPMC5430941): 954-964https://doi.org/10.1093/aje/kww191
- Uptake of systematic reviews and meta-analyses based on individual participant data in clinical practice guidelines: descriptive study.Bmj. 2015; 350 (h1088. Epub 2015/03/10PubMed PMID: 25747860; PubMed Central PMCID: PMCPMC4353308)https://doi.org/10.1136/bmj.h1088
- To IPD or not to IPD?:Advantages and disadvantages of systematic reviews using individual patient data.Eval Health Prof. 2002; 25 (PubMed PMID: 11868447): 76-97https://doi.org/10.1177/0163278702025001006
- Overcoming obstacles in obtaining individual participant data for meta-analysis.Res Synth Methods. 2016; 7 (Epub 2016/05/28PubMed PMID: 27228953): 333-341https://doi.org/10.1002/jrsm.1208
- The relative benefits of meta-analysis conducted with individual participant data versus aggregated data.Psychol Methods. 2009; 14 (PubMed PMID: 19485627): 165-176https://doi.org/10.1037/a0015565
- If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology.PLoS ONE. 2013; 8 (Epub 2013/07/23PubMed PMID: 23935830; PubMed Central PMCID: PMCPMC3720779): e67332https://doi.org/10.1371/journal.pone.0067332
- Availability of clinical trial data from industry-sponsored cardiovascular trials.J Am Heart Assoc. 2016; 5 (Epub 2016/04/20PubMed PMID: 27098969; PubMed Central PMCID: PMCPMC4859296)e003307https://doi.org/10.1161/JAHA.116.003307
- Exploring changes over time and characteristics associated with data retrieval across individual participant data meta-analyses: systematic review.Bmj. 2017; 357 (j1390. Epub 2017/04/07PubMed PMID: 28381561; PubMed Central PMCID: PMCPMC5733815)https://doi.org/10.1136/bmj.j1390
- Individual patient data meta-analyses.Best Pract Res Clin Obstet Gynaecol. 2005; 19 (Epub 2004/12/13PubMed PMID: 15749065): 47-55https://doi.org/10.1016/j.bpobgyn.2004.10.011
- Practical methodology of meta-analyses (overviews) using updated individual patient data. Cochrane working group.Stat Med. 1995; 14 (PubMed PMID: 8552887): 2057-2079https://doi.org/10.1002/sim.4780141902
- Individual participant data meta-analysis of prognostic factor studies: state of the art?.BMC Med Res Methodol. 2012; 12 (Epub 2012/04/24PubMed PMID: 22530717; PubMed Central PMCID: PMCPMC3413577): 56https://doi.org/10.1186/1471-2288-12-56
- Resource implications of preparing individual participant data from a clinical trial to share with external researchers.Trials. 2017; 18 (Epub 2017/07/17PubMed PMID: 28712359; PubMed Central PMCID: PMCPMC5512949): 319https://doi.org/10.1186/s13063-017-2067-4
- Handbook for systematic reviews of interventions. Version 5.1.0. The Cochrane Collaboration, 2011 (ed)
- Get real in individual participant data (IPD) meta-analysis: a review of the methodology.Res Synth Methods. 2015; 6 (Epub 2015/08/20PubMed PMID: 26287812; PubMed Central PMCID: PMCPMC5042043): 293-309https://doi.org/10.1002/jrsm.1160
Cochrane Methods Comparing Multiple Interventions: The Cochrane Collaboration. 2021 Available from: July 16, 2020, https://methods.cochrane.org/cmi/.
Cochrane Methods IPD Meta-analysis Group: The Cochrane Collaboration. 2021 Available from: July 16, 2020, https://methods.cochrane.org/ipdma/.
- Deworming children for soil-transmitted helminths in low and middle-income countries: systematic review and individual participant data network meta-analysis.J Development Effectiveness. 2019; 11: 288-306https://doi.org/10.1080/19439342.2019.1691627
- Mass deworming for improving health and cognition of children in endemic helminth areas: a systematic review and individual participant data network meta-analysis.Campbell Systematic Reviews. 2019; 15: e1058https://doi.org/10.1002/cl2.1058
- Reproducibility.Science. 2014; 343 (PubMed PMID: 24436391): 229https://doi.org/10.1126/science.1250475
- Replications in psychology research: how often do they really occur?.Perspect Psychol Sci. 2012; 7 (PubMed PMID: 26168110): 537-542https://doi.org/10.1177/1745691612460688
- The Value of Direct Replication.Perspect Psychol Sci. 2014; 9 (PubMed PMID: 26173243): 76-80https://doi.org/10.1177/1745691613514755
- Investigating variation in replicability.Soc Psychol. 2014; 45: 142-152https://doi.org/10.1027/1864-9335/a000178
- Making sense of replications.Elife. 2017; 6 (Epub 2017/01/19PubMed PMID: 28100398; PubMed Central PMCID: PMCPMC5245957)https://doi.org/10.7554/eLife.23383
- Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research.Communications in Statistics - Simulation and Computation. 2009; 38: 1228-1234https://doi.org/10.1080/03610910902859574
- Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.J Thorac Cardiovasc Surg. 2007; 134 (PubMed PMID: 17976439): 1128-1135https://doi.org/10.1016/j.jtcvs.2007.07.021
- A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003.Stat Med. 2008; 27 (PubMed PMID: 18038446): 2037-2049https://doi.org/10.1002/sim.3150
- Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores.J Clin Epidemiol. 2001; 54 (PubMed PMID: 11297888): 387-398
- Principled missing data methods for researchers.Springerplus. 2013; 2 (Epub 2013/05/14PubMed PMID: 23853744; PubMed Central PMCID: PMCPMC3701793): 222https://doi.org/10.1186/2193-1801-2-222
- Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.BMJ. 2009; 338 (Epub 2009/06/29PubMed PMID: 19564179; PubMed Central PMCID: PMCPMC2714692): b2393https://doi.org/10.1136/bmj.b2393
- Multiple imputation in health-care databases: an overview and some applications.Stat Med. 1991; 10 (PubMed PMID: 2057657): 585-598
- Multiple imputation for multivariate missing-data problems: a data analyst's perspective.Multivariate Behav Res. 1998; 33 (PubMed PMID: 26753828): 545-571https://doi.org/10.1207/s15327906mbr3304_5
- Stata 16 base reference manual.Stata Press, College Station, TX2019
- Normality tests for statistical analysis: a guide for non-statisticians.Int J Endocrinol Metab. 2012; 10 (Epub 2012/04/20PubMed PMID: 23843808; PubMed Central PMCID: PMCPMC3693611): 486-489https://doi.org/10.5812/ijem.3505
- Altitude correction for hemoglobin.Eur J Clin Nutr. 1994; 48 (PubMed PMID: 8001519): 625-632
- Individual Participant Data (IPD) Meta-analyses of Randomised Controlled Trials: guidance on Their Use.PLoS Med. 2015; 12 (Epub 2015/07/22PubMed PMID: 26196287; PubMed Central PMCID: PMCPMC4510878)e1001855https://doi.org/10.1371/journal.pmed.1001855
- Reanalyses of randomized clinical trial data.JAMA. 2014; 312 (PubMed PMID: 25203082): 1024-1032https://doi.org/10.1001/jama.2014.9646
- Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in.BMJ. 2018; 360 (k400. Epub 2018/02/13PubMed PMID: 29440066; PubMed Central PMCID: PMCPMC5809812)https://doi.org/10.1136/bmj.k400
- Challenges Associated With Using Large Data Sets for Quality Assessment and Research in Clinical Settings.Policy Polit Nurs Pract. 2015; 16 (Epub 2015/09/08PubMed PMID: 26351216; PubMed Central PMCID: PMCPMC4679583): 117-124https://doi.org/10.1177/1527154415603358
- Medical big data: promise and challenges.Kidney Res Clin Pract. 2017; 36 (Epub 2017/03/31PubMed PMID: 28392994; PubMed Central PMCID: PMCPMC5331970): 3-11https://doi.org/10.23876/j.krcp.2017.36.1.3
- External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges.BMJ. 2016; 353 (i3140. Epub 2016/06/22PubMed PMID: 27334381; PubMed Central PMCID: PMCPMC4916924)https://doi.org/10.1136/bmj.i3140
- Strategies for obtaining unpublished drug trial data: a qualitative interview study.Syst Rev. 2013; 2: 31https://doi.org/10.1186/2046-4053-2-31
Vivli [cited 2020 02/06]. 2021 Available from: July 16, 2020, https://vivli.org/about/overview-2/.
OpenTrials [cited 2020 02/06]. 2021 Available from: July 16, 2020, https://opentrials.net/.
- Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers.Trials. 2010; 11: 9https://doi.org/10.1186/1745-6215-11-9
- Sharing clinical trial data, maximizing benefits, minimizing risk.National Academies Press (US), Washington, DC2015
- Sharing raw data from clinical trials: what progress since we first asked "Whose data set is it anyway?".Trials. 2016; 17 (Epub 2016/05/04PubMed PMID: 27142986; PubMed Central PMCID: PMCPMC4855346): 227https://doi.org/10.1186/s13063-016-1369-2
- Preparing for responsible sharing of clinical trial data.N Engl J Med. 2013; 369 (Epub 2013/10/21PubMed PMID: 24144394): 1651-1658https://doi.org/10.1056/NEJMhle1309073
- Sharing and reuse of individual participant data from clinical trials: principles and recommendations.BMJ Open. 2017; 7 (Epub 2017/12/14PubMed PMID: 29247106; PubMed Central PMCID: PMCPMC5736032)e018647https://doi.org/10.1136/bmjopen-2017-018647
- Evaluation of repositories for sharing individual-participant data from clinical studies.Trials. 2019; 20 (Epub 2019/03/15PubMed PMID: 30876434; PubMed Central PMCID: PMCPMC6420770): 169https://doi.org/10.1186/s13063-019-3253-3
Article info
Publication history
Footnotes
Acknowledgments: We would like to thank Celia Holland for being part of our Advisory board. We would also like to thank Bishop Beasley, Nilanthi de Silva, Rebecca Stoltzfus and James Tielsch for contributing data for the network meta-analysis.
Conflict of interest statement: All authors have completed the ICMJE uniform disclosure form and declare: no support from any organization for the submitted work; Ms. Gaffey and Dr. Welch report grants from Bill and Melinda Gates Foundation, during the conduct of the study.
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy