Original Article| Volume 112, P20-27, August 2019

Expert panel diagnosis demonstrated high reproducibility as reference standard in infectious diseases



      If a gold standard is lacking in a diagnostic test accuracy study, expert diagnosis is frequently used as reference standard. However, interobserver and intraobserver agreements are imperfect. The aim of this study was to quantify the reproducibility of a panel diagnosis for pediatric infectious diseases.

      Study Design and Setting

      Pediatricians from six countries adjudicated a diagnosis (i.e., bacterial infection, viral infection, or indeterminate) for febrile children. Diagnosis was reached when the majority of panel members came to the same diagnosis, leaving others inconclusive. We evaluated intraobserver and intrapanel agreement with 6 weeks and 3 years’ time intervals. We calculated the proportion of inconclusive diagnosis for a three-, five-, and seven-expert panel.


      For both time intervals (i.e., 6 weeks and 3 years), intrapanel agreement was higher (kappa 0.88, 95%CI: 0.81-0.94 and 0.80, 95%CI: NA) compared to intraobserver agreement (kappa 0.77, 95%CI: 0.71-0.83 and 0.65, 95%CI: 0.52-0.78). After expanding the three-expert panel to five or seven experts, the proportion of inconclusive diagnoses (11%) remained the same.


      A panel consisting of three experts provides more reproducible diagnoses than an individual expert in children with lower respiratory tract infection or fever without source. Increasing the size of a panel beyond three experts has no major advantage for diagnosis reproducibility.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Reitsma J.B.
        • Rutjes A.W.
        • Khan K.S.
        • Coomarasamy A.
        • Bossuyt P.M.
        A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard.
        J Clin Epidemiol. 2009; 62: 797-806
        • Lynch T.
        • Bialy L.
        • Kellner J.D.
        • Osmond M.H.
        • Klassen T.P.
        • Durec T.
        • et al.
        A systematic review on the diagnosis of pediatric bacterial pneumonia: when gold is bronze.
        PLoS One. 2010; 5: e11989
        • Rutjes A.W.
        • Reitsma J.B.
        • Coomarasamy A.
        • Khan K.S.
        • Bossuyt P.M.
        Evaluation of diagnostic tests when there is no gold standard. A review of methods.
        Health Technol Assess. 2007; 11 (iii, ix-51)
        • DiGiorgio M.J.
        • Vinski J.
        • Bertin M.
        • Sun Z.
        • Bena J.F.
        • Albert N.M.
        Single-center study of interrater agreement in the identification of central line-associated bloodstream infection.
        Am J Infect Control. 2014; 42: 638-642
        • McBryde E.S.
        • Brett J.
        • Russo P.L.
        • Worth L.J.
        • Bull A.L.
        • Richards M.J.
        Validation of statewide surveillance system data on central line-associated bloodstream infection in intensive care units in Australia.
        Infect Control Hosp Epidemiol. 2009; 30: 1045-1049
        • Bada C.
        • Carreazo N.Y.
        • Chalco J.P.
        • Huicho L.
        Inter-observer agreement in interpreting chest X-rays on children with acute lower respiratory tract infections and concurrent wheezing.
        Sao Paulo Med J. 2007; 125: 150-154
        • Fischer J.E.
        • Seifarth F.G.
        • Baenziger O.
        • Fanconi S.
        • Nadal D.
        Hindsight judgement on ambiguous episodes of suspected infection in critically ill children: poor consensus amongst experts?.
        Eur J Pediatr. 2003; 162: 840-843
        • Greenwald P.W.
        • Schaible D.D.
        • Ruzich J.V.
        • Prince S.J.
        • Birnbaum A.J.
        • Bijur P.E.
        Is single observer identification of wound infection a reliable endpoint?.
        J Emerg Med. 2002; 23: 333-335
        • Loeb M.B.
        • Carusone S.B.
        • Marrie T.J.
        • Brazil K.
        • Krueger P.
        • Lohfeld L.
        • et al.
        Interobserver reliability of radiologists' interpretations of mobile chest radiographs for nursing home-acquired pneumonia.
        J Am Med Dir Assoc. 2006; 7: 416-419
        • Klompas M.
        Interobserver variability in ventilator-associated pneumonia surveillance.
        Am J Infect Control. 2010; 38: 237-239
        • de Vet H.C.
        • Mokkink L.B.
        • Terwee C.B.
        • Hoekstra O.S.
        • Knol D.L.
        Clinicians are right not to like Cohen's kappa.
        BMJ. 2013; 346: f2125
        • Gauvin F.
        • Dassa C.
        • Chaibou M.
        • Proulx F.
        • Farrell C.A.
        • Lacroix J.
        Ventilator-associated pneumonia in intubated children: comparison of different diagnostic methods.
        Pediatr Crit Care Med. 2003; 4: 437-443
        • Bertens L.C.
        • Broekhuizen B.D.
        • Naaktgeboren C.A.
        • Rutten F.H.
        • Hoes A.W.
        • van Mourik Y.
        • et al.
        Use of expert panels to define the reference standard in diagnostic research: a systematic review of published methods and reporting.
        PLoS Med. 2013; 10: e1001531
        • van Houten C.B.
        • de Groot J.A.
        • Klein A.
        • Srugo I.
        • Chistyakov I.
        • de Waal W.
        • et al.
        A host-protein based assay to differentiate between bacterial and viral infections in preschool children (OPPORTUNITY): a double-blind, multicentre, validation study.
        Lancet Infect Dis. 2017; 17: 431-440
        • Stroink H.
        • van Donselaar C.A.
        • Geerts A.T.
        • Peters A.C.
        • Brouwer O.F.
        • van Nieuwenhuizen O.
        • et al.
        Interrater agreement of the diagnosis and classification of a first seizure in childhood. The Dutch Study of Epilepsy in Childhood.
        J Neurol Neurosurg Psychiatry. 2004; 75: 241-245
        • Gabel M.J.
        • Foster N.L.
        • Heidebrink J.L.
        • Higdon R.
        • Aizenstein H.J.
        • Arnold S.E.
        • et al.
        Validation of consensus panel diagnosis in dementia.
        Arch Neurol. 2010; 67: 1506-1512
        • Hallgren K.A.
        Computing inter-rater reliability for observational data: an overview and tutorial.
        Tutor Quant Methods Psychol. 2012; 8: 23-34
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174
        • Tuijn S.
        • Janssens F.
        • Robben P.
        • van den Bergh H.
        Reducing interrater variability and improving health care: a meta-analytical review.
        J Eval Clin Pract. 2012; 18: 887-895
        • Muhlhofer H.M.
        • Lenze U.
        • Lenze F.
        • Rondak I.C.
        • Schauwecker J.
        • Rechl H.
        • et al.
        Inter- and intra-observer variability in biopsy of bone and soft tissue sarcomas.
        Anticancer Res. 2015; 35: 961-966
        • van Buijtenen J.M.
        • van Tunen M.L.
        • Zuidema W.P.
        • Heilbron E.A.
        • de Haan J.
        • de Vet H.C.
        • et al.
        Inter- and intra-observer agreement of the AO classification for operatively treated distal radius fractures.
        Strategies Trauma Limb Reconstr. 2015; 10: 155-159
        • Imerci A.
        • Aydogan N.H.
        • Tosun K.
        Evaluation of inter- and intra-observer reliability of current classification systems for subtrochanteric femoral fractures.
        Eur J Orthop Surg Traumatol. 2018; 28: 499-502
        • Okizaki A.
        • Nakayama M.
        • Nakajima K.
        • Katayama T.
        • Uno T.
        • Morikawa F.
        • et al.
        Inter- and intra-observer reproducibility of quantitative analysis for FP-CIT SPECT in patients with DLB.
        Ann Nucl Med. 2017; 31: 758-763
        • Ominde M.
        • Sande J.
        • Ooko M.
        • Bottomley C.
        • Benamore R.
        • Park K.
        • et al.
        Reliability and validity of the World Health Organization reading standards for paediatric chest radiographs used in the field in an impact study of Pneumococcal Conjugate Vaccine in Kilifi, Kenya.
        PLoS One. 2018; 13: e0200715
        • Meltzer H.Y.
        • Risinger R.
        • Nasrallah H.A.
        • Du Y.
        • Zummo J.
        • Corey L.
        • et al.
        A randomized, double-blind, placebo-controlled trial of aripiprazole lauroxil in acute exacerbation of schizophrenia.
        J Clin Psychiatry. 2015; 76: 1085-1090
        • Citrome L.
        • Du Y.
        • Risinger R.
        • Stankovic S.
        • Claxton A.
        • Zummo J.
        • et al.
        Effect of aripiprazole lauroxil on agitation and hostility in patients with schizophrenia.
        Int Clin Psychopharmacol. 2016; 31: 69-75