Original Article| Volume 61, ISSUE 3, P268-276, March 2008

Item response theory detected differential item functioning between healthy and ill children in quality-of-life measures

Published:September 14, 2007DOI:



      To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality-of-life measure in children and adolescents.

      Study Design and Setting

      This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition.


      DIF was detected for a majority of items but canceled out at the total test score level due to opposing directions of DIF. Post hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF.


      This article describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Hays R.D.
        • Morales L.S.
        • Reise S.P.
        Item response theory and health outcomes measurement in the 21st century.
        Med Care. 2000; 38: 28-42
        • Thissen D.
        • Steinberg L.
        • Wainer H.
        Detection of differential item functioning using the parameters of item response models.
        in: Holland P.W. Wainer H. Differential item functioning. Lawrence Erlbaum Associates, Hillsdale, NJ1993: 67-113
        • Lazarsfeld P.F.
        The logical and mathematical foundation of latent structure analysis.
        in: Stouffer S.A. Guttman L. Suchman E.A. Lazarsfeld P.F. Star S.A. Clausen J.A. Measurement and prediction. Wiley, New York1950: 363
        • Birnbaum A.
        Some latent trait models and their use in inferring an examinee's ability.
        in: Lord F.M. Novick M.R. Statistical theories of mental test scores. Addison-Wesley, Reading, MA1968: 395-479
        • Samejima F.
        Estimation of latent ability using a response pattern of graded scores.
        Psychometric Monograph No. 17. 1969; 34 (Part 2)
        • Samejima F.
        Graded response model.
        in: van der Linden W.J. Hambleton R.K. Handbook of item response theory. Springer-Verlag, New York1997: 85-100
        • Masters G.N.
        A Rasch model for partial credit scoring.
        Psychometrika. 1982; 47: 149-174
        • Muraki E.
        A generalized partial credit model: application of an EM algorithm.
        Appl Psychol Meas. 1992; 16: 159-176
        • Holland P.W.
        • Wainer H.
        Differential item functioning.
        Lawrence Erlbaum Associates, Hillsdale1993
        • Angoff W.H.
        Use of difficulty and discrimination indices for detecting item bias.
        in: Berk R.A. Handbook of methods for detecting test bias. Johns Hopkins University Press, Baltimore, MD1982: 96-116
        • Angoff W.H.
        Perspectives on differential item functioning methodology.
        in: Holland P.W. Wainer H. Differential item functioning. Lawrence Erlbaum, Hillsdale, NJ1993: 3-24
        • Budgell G.R.
        • Raju N.S.
        • Quartetti D.A.
        Analysis of differential item functioning in translated assessment instruments.
        Appl Psychol Meas. 1995; 19: 309-321
        • Hulin C.
        • Drasgow F.
        • Parsons C.K.
        Item response theory: application to psychological measurement.
        Dow Jones-Irwin, Hillsdale, NJ1983
        • Orlando M.
        • Marshall G.N.
        Differential item functioning in a Spanish translation of the PTSD checklist: detection and evaluation of impact.
        Psychol Assess. 2002; 14: 50-59
        • Bock R.D.
        • Aitkin M.
        Marginal maximum likelihood estimation of item parameters: an application of the EM algorithm.
        Psychometrika. 1981; 46: 443-449
        • Thissen D.
        • Steinberg L.
        • Wainer H.
        Use of item response theory in the study of group differences in trace lines.
        in: Wainer H. Braun H. Test validity. Lawrence Erlbaum Associates, Hillsdale, NJ1988: 147-169
        • Wainer H.
        • Sireci S.G.
        • Thissen D.
        Differential testlet functioning: definitions and detection.
        J Educ Meas. 1991; 28: 197-219
        • Teresi J.A.
        • Kleinman M.
        • Ocepek-Welikson K.
        Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures.
        Stat Med. 2000; 19: 1651-1683
        • Wainer H.
        Precision and differential item functioning on a testlet-based test: the 1991 law school admissions test as an example.
        Appl Meas Educ. 1995; 8: 157-186
        • Neyman J.
        • Pearson E.S.
        On the use and interpretation of certain test criteria for purposes of statistical inference.
        Biometrika. 1928; 20A: 263-294
        • Raju N.S.
        • van der Linden W.J.
        • Fleer P.F.
        IRT-based internal measures of differential functioning of items and tests.
        Appl Psychol Meas. 1995; 19: 353-368
        • Roznowski M.
        • Reith J.
        Examining the measurement quality of tests containing differentially functioning items: do biased items result in poor measurement?.
        Educ Psychol Meas. 1999; 52: 248-269
        • Varni J.W.
        • Seid M.
        • Kurtin P.S.
        The PedsQL™ 4.0: reliability and validity of the Pediatric Quality of Life Inventory™ Version 4.0 Generic Core Scales in healthy and patient populations.
        Med Care. 2001; 39: 800-812
        • Hill C.D.
        • Edwards M.C.
        • Thissen D.
        • Langer M.M.
        • Wirth R.J.
        • Burwinkle T.M.
        • et al.
        Practical issues in the application of item response theory: a demonstration using items from the Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 Generic Core Scales.
        Med Care. 2007; 45: S39-S47
        • Varni J.W.
        • Limbers C.A.
        • Burwinkle T.M.
        How young can children reliably and validly self-report their health-related quality of life?: an analysis of 8,591 children across age subgroups with the PedsQL™ 4.0 Generic Core Scales.
        Health Qual Life Outcomes. 2007; 5: 1
        • Williams V.S.L.
        • Jones L.V.
        • Tukey J.W.
        Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement.
        J Educ Behav Stat. 1999; 24: 42-69
        • Benjamini Y.
        • Hochberg Y.
        Controlling the false discovery rate: a practical and powerful approach to multiple testing.
        J R Stat Soc B. 1995; 57: 289-300
        • Steinberg L.
        • Thissen D.
        Using effect sizes for research reporting: examples using item response theory to analyze differential item functioning.
        Psychol Methods. 2006; 11: 402-415
        • Muthén L.K.
        • Muthén B.O.
        Mplus user's guide.
        2nd edition. Muthén & Muthén, Los Angeles, CA2004
        • Varni J.W.
        • Burwinkle T.M.
        • Seid M.
        • Skarr D.
        The PedsQL™ 4.0 as a pediatric population health measure: feasibility, reliability, and validity.
        Ambul Pediatr. 2003; 3: 329-341
        • Varni J.W.
        • Burwinkle T.M.
        • Seid M.
        The PedsQL™ 4.0 as a school population health measure: feasibility, reliability, and validity.
        Qual Life Res. 2006; 15: 203-215