Original Article| Volume 154, P65-74, February 2023

Imputing missing laboratory results may return erroneous values because they are not missing at random

  • Carl van Walraven
    Corresponding author. Ottawa Hospital Research Institute, ASB1-003, 1053 Carling Ave, Ottawa, ON K1Y 4E9, USA. Tel.: +1 613 761 4903; fax: +1 613 761 5492.
    Professor of Medicine and Epidemiology & Community Medicine, University of Ottawa, Ontario, Canada

    Senior Scientist, Ottawa Hospital Research Institute, Ontario, Canada

    Senior Scientist, ICES, Ontario, Canada

    Department of Medicine, University of Ottawa, Ontario, Canada

    Department of Epidemiology & Community Medicine, University of Ottawa, Ottawa Hospital Research Institute, ICES (formerly Institute for Clinical Evaluative Sciences), Ontario, Canada
    Search for articles by this author
  • Christopher McCudden
    Associate Professor, Department of Pathology & Lab. Medicine, University of Ottawa, Ontario, Canada

    Clinical Biochemist, Division of Biochemistry, The Ottawa Hospital, Ontario, Canada
    Search for articles by this author
  • Peter C. Austin
    Senior Scientist, ICES, Ontario, Canada

    Professor, Institute of Health Policy, Management and Evaluation, University of Toronto, Ontario, Canada
    Search for articles by this author
Published:December 14, 2022DOI:


      Background and Objectives

      Regression models incorporating laboratory tests treat unordered tests as missing and are often imputed. Imputation typically assumes that data are “missing at random” (MAR, test's order status is unrelated to its result after accounting for other variables). This study examined the validity of this assumption.


      We included 14 biochemistry tests. All tests were measured regardless of test order status. Test-stratified multiple linear regression determined the independent association between test result and order status after adjusting for patient age, sex, comorbidities, and patient location. Testing likelihood models were created for all tests using hospital-wide data.


      Four hundred thirty-four patients were included (mean age [standard deviation] 60.7 [19.1], 50.5% female). In 9 of 14 tests (64.2%), test results were significantly associated with order status after adjustment. Results were significantly more abnormal when tests were ordered for 6 tests and significantly more normal for 3 tests. Test abnormality increased as testing likelihood decreased.


      These data suggest that laboratory data are often not MAR. The direction and extent of differences in missing laboratory test values varies between tests. Overall the abnormality of ordered tests increased as testing likelihood decreased. These results suggest that imputating missing laboratory data may return biased values.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • MDCalc
        Ref Type: Online Source. 2022 (Available at)
        Date accessed: January 3, 2022
        • Escobar G.J.
        • Greene J.D.
        • Scheirer P.
        • Gardner M.N.
        • Draper D.
        • Kipnis P.
        Risk-adjusting hospital inpatient mortality using automated inpatient, outpatient, and laboratory databases.
        Med Care. 2008; 46: 232-239
        • van Walraven C.
        • Hart R.G.
        Leave 'em alone - why continuous variables should be analyzed as such.
        Neuroepidemiology. 2008; 30: 138-139
        • Houben P.H.H.
        • Winkens R.A.G.
        • van der Weijden T.
        • Vossen R.C.R.M.
        • Naus A.J.M.
        • Grol R.P.T.M.
        Reasons for ordering laboratory tests and relationship with frequency of abnormal results.
        Scand J Prim Health Care. 2010; 28: 18-23
        • Steyerberg E.W.
        • Missing values
        Clinical prdiction models: a practical approach to development, validation, and updating. 2nd ed. Springer, Cham, Switzerland2019: 127-156
        • Luo Y.
        Evaluating the state of the art in missing data imputation for clinical data.
        Brief Bioinform. 2022; 23: bbab489
        • Austin P.C.
        • White I.R.
        • Lee D.S.
        • van Buuren S.
        Missing data in clinical research: a tutorial on multiple imputation.
        Can J Cardiol. 2021; 37: 1322-1331
        • Little R.J.A.
        • Rubin D.B.
        Statistical analysis with missing data. 2nd ed. Wiley, New York2002
        • Clark T.G.
        • Altman D.G.
        Developing a prognostic model in the presence of missing data: an ovarian cancer case study.
        J Clin Epidemiol. 2003; 56: 28-37
        • Waljee A.K.
        • Mukherjee A.
        • Singal A.G.
        • Warren J.
        • Balis U.
        • Marrero J.
        • et al.
        Comparison of imputation methods for missing laboratory data in medicine.
        BMJ Open. 2013; 3: e002847
        • Charlson M.E.
        • Pompei P.
        • Ales K.L.
        • MacKenzie C.R.
        A new method of classifying prognostic comorbidity in longitudinal studies: development and validation.
        J Chronic Dis. 1987; 40: 373-383
        • Quan H.
        • Sundararajan V.
        • Halfon P.
        • Fong A.
        • Burnand B.
        • Luthi J.C.
        • et al.
        Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data.
        Med Care. 2005; 43: 1130-1139
        • Schneeweiss S.
        • Wang P.S.
        • Avorn J.
        • Glynn R.J.
        Improved comorbidity adjustment for predicting mortality in Medicare populations.
        Health Serv Res. 2003; 38: 1103-1120
        • Sauerbrei W.
        • Meier-Hirmer C.
        • Benner A.
        • Royston P.
        Multivariable regression model building by using fractional polynomials: description of SAS, STATA and R programs.
        Comput Stat Data Anal. 2006; 50: 3464-3485