This paper is only available as a PDF. To read, Please Download here.
Highlights
- •Data quality controls can be time-consuming to carry out.
- •Analysts often need to trade-off false positives and false negatives of error detection methods.
- •Multivariable, robust, error detection methods had comparable performance for gross errors.
- •The DDC algorithm outperformed the other methods for more complex error patterns.
- •The DDC algorithm has the potential to improve error detection processes for observational data.
Abstract
Objective
We evaluated the error detection performance of the DetectDeviatingCells (DDC) algorithm, which flags data anomalies at observation (casewise) and variable (cellwise) level in continuous variables. We compared its performance to other approaches in a simulated dataset.
Study design and setting
We simulated height and weight data for hypothetical individuals aged 2-20 years. We changed a proportion of height values according to pre-determined error patterns. We applied the DDC algorithm and other error-detection approaches (descriptive statistics, plots, fixed-threshold rules, classic and robust Mahalanobis distance) and we compared error detection performance with sensitivity, specificity, likelihood ratios, predictive values and ROC curves.
Results
At our chosen thresholds, error detection specificity was excellent across all scenarios for all methods and sensitivity was higher for multivariable and robust methods. The DDC algorithm performance was similar to other robust multivariable methods. Analysis of ROC curves suggested that all methods had comparable performance for gross errors (e.g. wrong measurement unit), but the DDC algorithm outperformed the others for more complex error patterns (e.g. transcription errors that are still plausible, although extreme).
Conclusions
The DDC algorithm has the potential to improve error detection processes for observational data.
Keyword
Article info
Publication history
Accepted:
February 13,
2023
Received in revised form:
February 3,
2023
Received:
October 25,
2022
Publication stage
In Press Journal Pre-ProofIdentification
Copyright
© 2023 The Author(s). Published by Elsevier Inc.
User license
Creative Commons Attribution (CC BY 4.0) | How you can reuse
Elsevier's open access license policy

Creative Commons Attribution (CC BY 4.0)
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy