Abstract
Objectives
Study Design and Setting
Results
Conclusion
Keywords
- •Across two different datasets classification models incorporating continuous clinical features combined with early insulin requirement, or (where available) interview- reported diabetes type consistently achieved high accuracy (≥85%).
- •When identifying a type 1 diabetes (T1D) cohort with minimal misclassification, young age at diagnosis (<20 years) or models with high thresholds had very high predictive value but modest sensitivity.
Key findings
- •The best approaches for classifying diabetes type in research datasets without measured classification biomarkers were previously unclear. This work allows researchers to identify the optimum classification approach for their dataset and research question.
What this adds to what is known?
- •The optimal method for identifying diabetes subtypes in observational data will depend on available data and research question. Researchers can select the optimum approach using an online tool devised using the study findings (Classifying Diabetes for Research: Method Selector (newcastlerse.github.io)).
Implications
1. Introduction
1.1 Robustly classifying diabetes type in research datasets without measured classification biomarkers is challenging
Scottish Diabetes Survey 2019.
1.2 The comparative performance of approaches to classify insulin-treated diabetes in epidemiological studies is unknown
NHS Diabetes Coding, classification and diagnosis of diabetes A review of the coding, classification and diagnosis of diabetes in primary care in England with recommendations for improvement.
- Weisman A.
- Tu K.
- Young J.
- Kumar M.
- Austin P.C.
- Jaakkimainen L.
- et al.
- Zhong V.W.
- Pfaff E.R.
- Beavers D.P.
- Thomas J.
- Jaacks L.M.
- Bowlby D.A.
- et al.
- Weisman A.
- Tu K.
- Young J.
- Kumar M.
- Austin P.C.
- Jaakkimainen L.
- et al.
- Zhong V.W.
- Pfaff E.R.
- Beavers D.P.
- Thomas J.
- Jaacks L.M.
- Bowlby D.A.
- et al.
1.3 Aim
2. Method
2.1 Study design and participants
2.1.1 UK Biobank
2.1.2 DARE cohort
2.2 Assessment of population-level approaches for classifying diabetes type in insulin-treated individual
Reference name (approach number) | Clinical information required | Cut offs used and reference code | ||
---|---|---|---|---|
Whole cohort | For defining T1D only | For defining T2D only | ||
For defining T1D remainder T2D | ||||
Age (1) | Age at diagnosis | <35 yr [ [10] ] | ≤20 yr [ [16] ] | ≥40 yr [ [10] ] |
BMI (2) | Current BMI | ≤25 kg/m2 [ [14] ] | ≤23 kg/m2 [ [10] ] | ≥28 kg/m2 [ [10] ] |
Clinical model (3) | Current BMI, age at diagnosis | Model probability ≥ 12% [ [19] ] | Model probability ≥ 80% | Model probability < 5% |
Lipid mod (4) | Current BMI, age at diagnosis, Sex, HDL, triglyceride, and total cholesterol | Model probability ≥ 12% [ 19 , 20 ] | Model probability ≥ 80% | Model probability < 5% |
ICD codes (5) | (ICD 10 or 9 code), OHA, age at diagnosis, and DKA episode history | Algorithm T1D [ [18] ] | N/A | N/A |
UKBB algorithm (6) | Age at diagnosis, time to Insulin, nonmetformin OHA, and interviewreport of T1D, ethnicity | Possible and probable T1D [ [15] ] | Probable T1D[ [13] ] | Probable T2D [ [13] ] |
Interview reported (7) | Interview-reported diabetes type | Interview-reported diabetes T1D [ [32] ] | N/A | N/A |
Diagnosis codes algorithm (8) | Diabetes diagnosis codes, non metformin OHA, prescription for glucagon, and prescription for urine acetone strip | Ratio of T1D to T2D diagnosis codes >0.5 with either glucagon, non metformin OHA prescription, or prescription of urine acetone strip alone [ [16] ] | N/A | N/A |
Diagnosis code + age (9) | Diabetes diagnosis codes and age at diagnosis. | Any diagnosis code of T1D or age at diagnosis <22 yr [ [17] ] | N/A | N/A |
Majority diagnosis codes (10) | Diabetes diagnosis codes | Ratio of TID to T2D diagnosis codes >0.5 [ [22] ] | N/A | N/A |
2.3 Biological definitions of diabetes type approaches evaluated against
2.3.1 UK Biobank
2.3.2 Diabetes Alliance for Research in England
2.4 Statistical analysis
2.4.1 UK Biobank
Where is the number of cases called as having T1D and is the number of cases called as having T2D by each approach.
2.5 Determining accuracy in UK Biobank and DARE
3. Results
3.1 Performance of approaches to classify all insulin-treated White European participants with diabetes in UK Biobank
Approach | Called T1D | Called T2D | Accuracy | ||||
---|---|---|---|---|---|---|---|
(n) | PPV | Sensitivity | (n) | PPV | Sensitivity | ||
Lipid model probability ≥12% and insulin within year of diagnosis | 1,169 | 87% (84-90) | 79% (77-81) | 2,365 | 88% (86-91) | 93% (92-94) | 88% |
Clinical model probability ≥12% and insulin a within year of diagnosis | 1,047 | 89% (86-92) | 72% (70-75) | 2,487 | 86% (83-88) | 95% (94-96) | 87% |
Interview-reported diabetes type (n = 519 available) and insulin within a year of diagnosis | 224 | 85% (77-92) | 86% (81-91) | 295 | 89% (82-97) | 89% (85-92) | 87% |
Interview-reported diabetes type (n = 519 available) | 253 | 80% (73-87) | 92% (88-95) | 266 | 93% (86-101) | 83% (79-87) | 87% |
UKBB probable & possible T1D and insulin within a year of diagnosis | 988 | 90% (87-93) | 69% (66-71) | 2,546 | 84% (82-87) | 96% (95-96) | 86% |
ICD algorithm and insulin within a year of diagnosis | 1,025 | 89% (86-92) | 71% (68-73) | 2,509 | 85% (82-87) | 95% (94-96) | 86% |
UKBB probable & possible T1D and insulin within a year of diagnosis (no interview report) | 918 | 93% (89-96) | 66% (63-68) | 2,616 | 83% (81-85) | 97% (96-98) | 85% |
ICD algorithm | 1,184 | 82% (79-85) | 75% (73-78) | 2,350 | 86% (84-89) | 91% (90-92) | 85% |
Age diabetes diagnosed <35 yr and insulin within a year of diagnosis | 867 | 93% (89-96) | 62% (59-65) | 2,667 | 82% (79-84) | 97% (96-98) | 84% |
Lipid model probability ≥12% | 1,501 | 74% (71-77) | 86% (84-88) | 2,033 | 91% (89-94) | 83% (81-84) | 84% |
UKBB probable & possible type 1 diabetes (no interview report) | 1,142 | 80% (77-83) | 70% (68-73) | 2,392 | 84% (81-87) | 90% (88-91) | 83% |
UKBB probable & possible T1D | 1,231 | 78% (75-81) | 74% (72-77) | 2,303 | 86% (83-88) | 88% (87-89) | 83% |
Clinical model probability ≥12% | 1,325 | 76% (73-79) | 78% (76-80) | 2,209 | 87% (84-90) | 86% (84-87) | 83% |
Age diabetes diagnosed <35 yr | 1,065 | 80% (77-84) | 66% (64-69) | 2,469 | 82% (80-85) | 91% (89-92) | 82% |
BMI ≤25 (kg/m2) and insulin within a year of diagnosis | 511 | 80% (75-85) | 32% (29-34) | 3,023 | 71% (68-73) | 95% (95-96) | 72% |
BMI ≤25 (kg/m2) | 658 | 70% (65-74) | 35% (33-38) | 2,876 | 71% (68-73) | 91% (90-92) | 71% |
3.2 Performance of approaches to classifying all insulin-treated participants with diabetes in DARE
Approach | Called T1D | Called T2D | Accuracy | ||||
---|---|---|---|---|---|---|---|
(n) | PPV | Sensitivity | (n) | PPV | Sensitivity | ||
Interview-reported diabetes type and insulin within a year of diagnosis | 310 | 89% (85-92) | 82% (78-86) | 474 | 88% (85-91) | 92% (90-95) | 88% |
Interview-reported diabetes type | 335 | 86% (83-90) | 87% (83-90) | 449 | 90% (87-93) | 90% (87-93) | 88% |
UKBB probable & possible T1D and insulin within a year of diagnosis (including interview report) | 325 | 86% (82-90) | 84% (80-88) | 459 | 88% (85-91) | 90% (87-93) | 87% |
Clinical model probability ≥12% and insulin a within year of diagnosis | 278 | 90% (86-93) | 75% (70-79) | 506 | 83% (80-86) | 94% (91-96) | 85% |
UKBB probable & possible T1D and insulin within a year of diagnosis (no interview report) | 257 | 90% (87-94) | 69% (65-74) | 527 | 81% (77-84) | 94% (92-97) | 84% |
UKBB probable & possible T1D (including interview report) | 392 | 76% (72-80) | 89% (86-92) | 392 | 91% (88-93) | 79% (75-83) | 83% |
Age diabetes diagnosed <35 yr and insulin within a year of diagnosis | 242 | 90% (86-94) | 65% (60-70) | 542 | 79% (75-82) | 95% (93-97) | 82% |
Clinical model probability ≥12% | 346 | 78% (74-82) | 81% (77-85) | 438 | 85% (82-89) | 83% (80-87) | 82% |
UKBB probable & possible T1D (no interview report) | 300 | 81% (77-85) | 73% (68-78) | 484 | 81% (78-85) | 87% (84-90) | 81% |
Age diabetes diagnosed <35 yr | 280 | 80% (76-85) | 67% (62-72) | 504 | 78% (75-82) | 88% (85-91) | 79% |
BMI ≤25 (kg/m2) and insulin within a year of diagnosis | 140 | 87% (82-93) | 37% (31-42) | 644 | 67% (63-71) | 96% (94-98) | 71% |
BMI ≤25 (kg/m2) | 187 | 72% (66-79) | 40% (35-46) | 597 | 67% (63-70) | 88% (85-91) | 68% |
3.3 Performance of approaches to optimally identify type 1 and type 2 diabetes among insulin-treated participants with diabetes
Approach | UK Biobank | DARE | ||
---|---|---|---|---|
PPV of cases called T1D | Sensitivity for identifying T1D | PPV of cases called T1D | Sensitivity for identifying T1D | |
Age diabetes diagnosed ≤20 yr and insulin within a year of diagnosis | 100% (99-100) | 33% (30-35) | 96% (93-100) | 40% (32-49) |
Clinical model probability ≥80% and insulin within a year of diagnosis | 99% (98-100) | 37% (34-39) | 96% (93-99) | 47% (39-54) |
Lipid model probability ≥80% and insulin within a year of diagnosis | 97% (95-98) | 40% (38-43) | n/a | n/a |
Lipid model probability ≥80% | 92% (90-94) | 42% (39-45) | n/a | n/a |
Age diabetes diagnosed ≤20 yr | 92% (90-95) | 34% (31-36) | 96% (92-99) | 40% (32-49) |
Clinical model probability ≥20% and insulin within a year of diagnosis | 91% (89-92) | 67% (65-70) | 91% (88-95) | 70% (65-76) |
Clinical model probability ≥80% | 91% (88-93) | 37% (35-40) | 93% (90-97) | 47% (39-55) |
UKBB probable T1D and insulin within a year of diagnosis | 90% (88-92) | 69% (66-71) | 86% (82-90) | 84% (80-88) |
Lipid model probability ≥20% and insulin within a year of diagnosis | 89% (87-91) | 75% (73-77) | n/a | n/a |
UKBB probable T1D | 89% (87-91) | 70% (67-72) | 84% (80-88) | 88% (84-91) |
BMI ≤23 (kg/m2) and insulin within a year of diagnosis | 82% (78-87) | 16% (14-18) | 90% (83-97) | 19% (10-28) |
Interview-reported T1D | 80% (75-85) | 92% (88-95) | 86% (83-90) | 87% (83-90) |
Lipid model probability ≥20% | 80% (78-82) | 81% (79-84) | n/a | n/a |
Clinical model probability ≥20% | 80% (78-83) | 71% (69-74) | 84% (80-88) | 74% (69-79) |
BMI ≤23 (kg/m2) | 75% (70-80) | 17% (15-19) | 79% (70-87) | 20% (11-28) |
3.4 Performance of approaches to classifying all insulin-treated participants with diabetes in UK Biobank
3.5 Development of algorithm for optimal approach selection
4. Discussion
- Guideline N.
Standards of care for management of adults with type 1 diabetes 2017 2017.
5. Conclusion
Acknowledgments
Appendix A. Supplementary Data
- Supplementary Materials
References
- Scottish Diabetes Survey 2019.Scottish Diabetes Survey, 2019 (Available at)https://www.diabetesinscotland.org.uk/wp-content/uploads/2020/10/Diabetes-Scottish-Diabetes-Survey-2019.pdf(In press)Date accessed: June 1, 2022
- 2. Classification and diagnosis of diabetes.Diabetes Care. 2017; 40: S11-S24
- Frequency and phenotype of type 1 diabetes in the first six decades of life: a cross-sectional, genetically stratified survival analysis from UK Biobank.Lancet Diabetes Endocrinol. 2018; 6: 122-129
- Global epidemiology of type 1 diabetes in young adults and adults: a systematic review.BMC Public Health. 2015; 15: 255
- Incidence of type 1 diabetes in age groups above 15 years: facts, hypothesis and prospects for future epidemiologic research.Acta Diabetol. 2016; 53: 339-347
- Adult-onset type 1 diabetes: current understanding and challenges.Diabetes Care. 2021; 44: 2449-2456
- Impact of routine clinic measurement of serum C-peptide in people with a clinician-diagnosis of type 1 diabetes.Diabetic Med. 2020; 38: e14449
- Type 1 diabetes defined by severe insulin deficiency occurs after 30 years of age and is commonly treated as type 2 diabetes.Diabetologia. 2019; 62: 1167-1172
- Misdiagnosis and diabetic ketoacidosis at diagnosis of type 1 diabetes: patient and caregiver perspectives.Clin Diabetes. 2019; 37: 276-281
- Practical Classification Guidelines for Diabetes in patients treated with insulin: a cross-sectional study of the accuracy of diabetes diagnosis.Br J Gen Pract. 2016; 66: E315-E322
- Incorrect and incomplete coding and classification of diabetes: a systematic review.Diabetic Med. 2010; 27: 491-497
- Predicting diabetes mellitus with machine learning techniques.Front Genet. 2018; 9: 515
- The clinical utility of C-peptide measurement in the care of patients with diabetes.Diabetic Med. 2013; 30: 803-817
- The incidence of adult-onset type 1 diabetes: a systematic review from 32 countries and regions.Diabetes Care. 2022; 45: 994-1006
- Algorithms for the capture and adjudication of prevalent and incident diabetes in UK biobank.PLoS One. 2016; 11: e0162388
- Automated detection and classification of type 1 versus type 2 diabetes using electronic health record data.Diabetes Care. 2013; 36: 914-921
- Developing a case definition for type 1 diabetes mellitus in a primary care electronic medical record database: an exploratory study.CMAJ Open. 2019; 7: E246-E251
- Identifying type 1 and type 2 diabetic cases using administrative data: a tree-structured model.J Diabetes Sci Technol. 2011; 5: 486-493
- Development and validation of multivariable clinical diagnostic models to identify type 1 diabetes requiring rapid insulin therapy in adults aged 18-50 years.BMJ Open. 2019; 9: e031586
- Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults.Diagn Progn Res. 2020; 4: 6
- NHS Diabetes Coding, classification and diagnosis of diabetes A review of the coding, classification and diagnosis of diabetes in primary care in England with recommendations for improvement.(Available at)https://orchid.phc.ox.ac.uk/wp-content/uploads/2017/02/nhs_diabetes_and_rcgp_cod_final_report.pdfDate: 2011Date accessed: June 1, 2022
- Validation of an algorithm for identifying type 1 diabetes in adults based on electronic health record data.Pharmacoepidemiol Drug Saf. 2018; 27: 1053-1059
- An algorithm for identification and classification of individuals with type 1 and type 2 diabetes mellitus in a large primary care database.Clin Epidemiol. 2016; 8: 373-380
- Validation of a type 1 diabetes algorithm using electronic medical records and administrative healthcare data to study the population incidence and prevalence of type 1 diabetes in Ontario, Canada.BMJ Open Diabetes Res Care. 2020; 8: e001224
- Use of administrative and electronic health record data for development of automated algorithms for childhood diabetes case ascertainment and type classification: the SEARCH for Diabetes in Youth Study.Pediatr Diabetes. 2014; 15: 573-584
- Histological validation of a type 1 diabetes clinical diagnostic model for classification of diabetes.Diabet Med. 2020; 37: 2160-2168
- Estimating disease prevalence in large datasets using genetic risk scores.Nat Commun. 2021; 12: 6441
- UK biobank data: come and get it.Sci Transl Med. 2014; 6: 224ed4
- A type 1 diabetes genetic risk score can aid discrimination between type 1 and type 2 diabetes in young adults.Diabetes Care. 2015; 39: 337-344
- Type 1 diabetes genetic risk score: a novel tool to discriminate monogenic and type 1 diabetes.Diabetes. 2016; 65: 2094-2099
- Height, body mass index, and socioeconomic status: mendelian randomisation study in UK Biobank.BMJ. 2016; 352: i582
- Identifying optimal survey-based algorithms to distinguish diabetes type among adults with diabetes.J Clin Transl Endocrinol. 2020; 21: 100231
- Relative contribution of type 1 and type 2 diabetes loci to the genetic etiology of adult-onset, non-insulin-requiring autoimmune diabetes.BMC Med. 2017; 15: 88
- Overview of the type I diabetes genetics consortium.Genes Immun. 2009; 10 Suppl 1: S1-S4
- IgA nephropathy genetic risk score to estimate the prevalence of IgA nephropathy in UK biobank.Kidney Int Rep. 2020; 5: 1643-1650
- The management of type 1 diabetes in adults. A consensus report by the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD).Diabetologia. 2021; 64: 2609-2652
- Type 1 diabetes in adults: diagnosis and management 2022.(Available at)https://www.nice.org.uk/guidance/ng17/chapter/rationale-and-impact#diagnosisDate accessed: June 2, 2022
- Diagnosing type 1 diabetes in adults: guidance from the UK T1D immunotherapy consortium.Diabet Med. 2022; 39: e14862
- Standards of care for management of adults with type 1 diabetes 2017 2017.(Available at)https://abcd.care/sites/abcd.care/files/resources/Standards_of_Care_T1DM_ABCD_FINAL.pdfDate accessed: April 14, 2020
- Can clinical features be used to differentiate type 1 from type 2 diabetes? A systematic review of the literature.BMJ Open. 2015; 5: e009088
- Genetic effects on age-dependent onset and islet cell autoantibody markers in type 1 diabetes.Diabetes. 2002; 51: 1346-1355
- Genetic analysis of adult-onset autoimmune diabetes.Diabetes. 2011; 60: 2645-2653
- Application of a genetic risk score to racially diverse type 1 diabetes populations demonstrates the need for diversity in risk-modeling.Sci Rep. 2018; 8: 4529
- The relationship between islet autoantibody status and the genetic risk of type 1 diabetes in adult-onset type 1 diabetes.Diabetalogia. 2022; 66: 310-320
- Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population.Am J Epidemiol. 2017; 186: 1026-1034
- New models for large prospective studies: is there a better way?.Am J Epidemiol. 2012; 175: 859-866
- Latent Autoimmune Diabetes of Adults (LADA) is likely to represent a mixed population of autoimmune (Type 1) and nonautoimmune (Type 2) diabetes.Diabetes Care. 2021; 44: 1243-1251
Article info
Publication history
Footnotes
Funding: The DARE study was funded by the Welcome Trust and supported by the National Institute of Health and Care Research (NIHR) Exeter Clinical Research Facility. NJT is funded by a Welcome Trust funded GW4 PhD. AM is supported by a NIHR Academic Clinical Fellowship. M.N.W. is supported by the Welcome Trust Institutional Support Fund (WT097835MF). SAS is supported by a Diabetes UK PhD studentship (17/0005757). JMD is supported by an Independent Fellowship funded by Research England's Expanding Excellence in England (E3) fund. KGY is supported by Research England's Expanding Excellence in England (E3) fund. ATH is supported by the NIHR Exeter Clinical Research Facility and a Welcome Senior Investigator award and an NIHR Senior Investigator award. AGJ was supported by an NIHR Clinician Scientist award (CS-2015-15-018). The views given in this article do not necessarily represent those of the NIHR, the National Health Service, or the Department of Health.
Conflict of interest: AGJ contributed to the development of the two classification models assessed in this work. Other authors declare that there are no relationships or activities that might bias, or be perceived to bias, their work.
Ethics Approval: Ethics for the DARE study was granted by the Devon & Torbay Research Ethics Committee, ref: 2002/7/118.
Availability of data and materials: UKBB data are available through a procedure described at https://www.ukBiobank.ac.uk/using-the-resource/. DARE data are available through application to the Peninsula Research Bank https://exetercrfnihr.org/about/exeter-10000-prb/
Authors Contributions: NJT, AM, and AGJ designed the study. SAS, KGY, and MNW acquired the data and SAS and MNW generated the T1DGRS. NJT, JD, AM, and AGJ analyzed the data. NJT wrote the first draft of the report. All authors reviewed the draft, contributed to the revision of the report and gave final approval for publication. AGJ and NJT are the guarantors of this work.
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy