Abstract
Objectives
Study Design and Setting
Results
Conclusion
Keywords
- •A complete case logistic regression will give a biased estimate of the exposure odds ratio if the probability of being a complete case depends on a continuous outcome but a binary version of this outcome is used in the analysis; this bias is likely to be small unless the association between the continuous outcome and the chance of being a complete case is strong. If there is an interaction between the exposure and outcome in terms of the probability of being a complete case, there could be substantial bias in the estimate of the log odds ratio.
- •If an interaction is present, including one or more auxiliary variables that are good predictors of the missing binary outcome in multiple imputation (MI), models will lead to relatively large bias reductions if these variables have high sensitivity and specificity in relation to the binary outcome; if not, the bias reductions will be small.
Key findings
- •It is known that a complete case logistic regression will give an unbiased estimate of the exposure odds ratio if the probability of being a complete case depends on the outcome and exposure independently. We show that this does not hold when the probability of being a complete case depends on an underlying continuous outcome and a binary form of this is used for analysis.
What this adds to what was known?
- •If one or more good predictors of the missing outcome are available, we would recommend using MI over a complete case analysis because, in practice, it would be difficult to rule out an interaction.
What is the implication and what should change now?
1. Introduction
2. Methods
2.1 Linkage to general practitioner data
2.2 Analysis of ALSPAC data
2.3 Simulation study
2.3.1 Simulated datasets
2.3.2 Generating the missing data
- (i)The probability of the outcome being observed was only associated with the continuous outcome
- (ii)The log probability of the outcome being observed depended linearly on the exposure, continuous outcome, and their interaction (note that, henceforth, where we refer to an interaction, this is what we mean)
2.3.3 Scenarios investigated
Factor 4: % Missing linked data | Factor 3: Interaction between outcome & exposure with respect to probability of being observed | Factor 2: Sensitivity of GP depression |
---|---|---|
0% | No | 25% |
0% | No | 75% |
0% | Yes | 25% |
0% | Yes | 75% |
25% | Yes | 75% |
2.3.4 Statistical analysis
3. Results
3.1 Bias in complete case analysis

3.2 Simulation study


3.3 Analysis of ALSPAC data
Complete data on: | Linked GP data | Total | |||
---|---|---|---|---|---|
Covariates | Maternal smoking status in pregnancy | Depression status (CIS-R) | Yes | No | |
Yes | Yes | Yes | 2,201 | 517 | 2,718 |
No | 2,923 | 1,386 | 4,309 | ||
No | Yes | 180 | 40 | 220 | |
No | 280 | 135 | 415 | ||
No | Yes | Yes | 830 | 185 | 1,015 |
No | 2,196 | 989 | 3,185 | ||
No | Yes | 478 | 106 | 584 | |
No | 1,472 | 648 | 2,120 | ||
10,560 | 4,006 | 14,566 |
3.3.1 Association between ALSPAC-measured and GP-recorded depression
GP measure | Present? | CIS-R diagnosis of depression | |
---|---|---|---|
No | Yes | ||
Current diagnosis or symptoms or treatment | No | 3,012 (97.7%) | 199 |
Yes | 72 | 71 (26.3%) | |
Future diagnosis or symptoms or treatment | No | 2,500 (79.6%) | 126 |
Yes | 640 | 156 (55.3%) | |
Historical diagnosis or symptoms or treatment | No | 3,233 (96.2%) | 217 |
Yes | 127 | 64 (22.8%) |
3.3.2 Predictors of observed ALSPAC-measured depression data
Variable | Present? | Odds ratio (OR) (95% CI) | P-value |
---|---|---|---|
Historical diagnosis or symptoms or treatment | Yes | 0.88 (0.68, 1.15) | P = 0.4 |
Current diagnosis or symptoms or treatment | Yes | 0.81 (0.59, 1.11) | P = 0.2 |
Future diagnosis or symptoms or treatment | Yes | 0.76 (0.66, 0.88) | P < 0.001 |
3.3.3 Relationship between maternal smoking in pregnancy and offspring depression
Analysis approach | Crude OR (95% CI) | Adjusted OR (95% CI) | Gain in precision (adjusted log OR) |
---|---|---|---|
Complete case (n = 2,718) | 1.72 (1.20, 2.46) | 1.36 (0.92, 2.02) | n/a |
MI (n = 14,566) | 1.86 (1.44, 2.40) | 1.46 (1.06, 2.01) | 24% |
4. Discussion
Acknowledgments
Supplementary data
- Supplementary Material
References
- Multiple Imputation and its Application.Wiley, Chichester, UK2013
- Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression.Am J Epidemiol. 2015; 182: 730-736
- Why psychiatric research must abandon traditional diagnostic classification and adopt a fully dimensional scope: two solutions to a persistent problem.Front Psychiatry. 2017; 8: 101
- Henderson J., et al. Cohort profile: the ‘children of the 90s’—the index offspring of the Avon longitudinal study of parents and children.Int J Epidemiol. 2013; 41: 111-127
- Davey Smith G. et al. Cohort profile: the Avon longitudinal study of parents and children: ALSPAC mothers cohort.Int J Epidemiol. 2013; 41: 97-110
- (Available at:)http://www.bristol.ac.uk/alspac/researchers/our-data/Date accessed: June 20, 2021
- Measuring psychiatric disorder in the community: a standardized assessment for use by lay interviewers.Psychol Med. 1992; 22: 465-486
- Defining adolescent common mental disorders using electronic GP data: a comparison with outcomes measured using the CIS-R.BMJ Open. 2016; 6e013167
- The proportion of missing data should not be used to guide decisions on multiple imputation.J Clin Epidemiol. 2019; 110: 63-73
- Using simulation studies to evaluate statistical methods.Stat Med. 2019; 38: 2074-2102
- Bjørngaard J.H. et al. Maternal Smoking in Pregnancy and Offspring Depression: a cross cohort and negative control study.Sci Rep. 2017; 7: 12579
- A comparison of inclusive and restrictive strategies in modern missing data procedures.Psychol Methods. 2001; 6: 330-351
- Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.Emerg Themes Epidemiol. 2017; 14
- Using auxiliary data for parameter estimation with non-ignorably missing outcomes.J R Stat Soc Ser C Appl Stat. 2001; 50: 361-373
Article info
Publication history
Footnotes
Author statement: Rosie Cornish: Conceptualisation, methodology, formal analysis, writing–original draft, reviewing and editing. Jonathan Bartlett: methodology, software, investigation, writing–review and editing. John Macleod–supervision, writing–review and editing. Kate Tilling: Conceptualisation, methodology, supervision, writing–review and editing.
Declaration of interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethics approval and consent to participate: Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees (NHS Haydock REC: 10/H1010/70). All procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. Informed consent for the use of questionnaire and clinic data were obtained from participants following recommendations of the ALSPAC Ethics and Law Committee at the time. Study participants who complete questionnaires consent to the use of their data by approved researchers. Up until age 18 an overarching informed parental consent was used to indicate parents were happy for their child (the study participant) to take part in ALSPAC. Consent for data collection and use was implied via the written completion and return of questionnaires. Study participants have the right to withdraw their consent for specific elements of the study or from the study as a whole, at any time. At age 18, study children were sent ‘fair processing’ materials describing ALSPAC's intended use of their health and administrative records and were given clear means to consent or object via a written form. Data were not extracted for participants who objected, or who were not sent fair processing materials.
Consent for publication: Not applicable.
Availability of data and materials: Due to ALSPAC data access permissions, the authors do not have the authority to share the study data analyzed in this study, but any researcher can apply to use ALSPAC data, including the variables used in this investigation. Information about access to ALSPAC data is given on their website (http://www.bristol.ac.uk/alspac/researchers/access/). The code used to generate the simulated datasets is available from the corresponding author on reasonable request.
Competing interests: The authors declare that they have no competing interests.
Funding: This work was supported by the Medical Research Council (MR/L012081). The UK Medical Research Council and the Wellcome Trust (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. Data collection is funded from a range of sources. KT and RC work in the MRC Integrative Epidemiology Unit which receives funding from the UK Medical Research Council and the University of Bristol (MC_UU_00011/3). JB was supported by a UK Medical Research Council grant (MR/T023953/1). JM is partly funded by the National Institute for Health Research Collaboration West (NIHR ACR West) at University Hospitals Bristol and Weston NHS Foundation Trust, UK.
Authors’ contributions: RC and KT conceived and designed the study, with input from JB. JB derived the expression used to calculate the bias in the complete case estimate of the log OR. RC ran the simulations and conducted the analyses. RC, KT, and JB interpreted the results. RC wrote the first draft of the manuscript with substantial contributions from KT and JB. RC, KT, JB, and JM revised and edited the manuscript. All authors read and approved the final manuscript.
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy