Original Article| Volume 79, P140-149, November 2016

Measurement model choice influenced randomized controlled trial results

  • Rosalie Gorter
    Corresponding author. PO Box 7057, 1007 MB Amsterdam, The Netherlands. Tel.: +31 (0)204446038.
    Department of Epidemiology & Biostatistics, VU University Medical Centre, Amsterdam, The Netherlands

    EMGO+ Institute for Health and Care Research, Amsterdam, The Netherlands
    Search for articles by this author
  • Jean-Paul Fox
    Department of Research Methodology, Measurement, and Data Analysis, Faculty of Behavioural, Management & Social Sciences, University of Twente, Enschede, The Netherlands
    Search for articles by this author
  • Adri Apeldoorn
    Department of Epidemiology & Biostatistics, VU University Medical Centre, Amsterdam, The Netherlands

    Rehabilitation Department, Medical Centre Alkmaar, Alkmaar, The Netherlands
    Search for articles by this author
  • Jos Twisk
    Department of Epidemiology & Biostatistics, VU University Medical Centre, Amsterdam, The Netherlands
    Search for articles by this author



      In randomized controlled trials (RCTs), outcome variables are often patient-reported outcomes measured with questionnaires. Ideally, all available item information is used for score construction, which requires an item response theory (IRT) measurement model. However, in practice, the classical test theory measurement model (sum scores) is mostly used, and differences between response patterns leading to the same sum score are ignored. The enhanced differentiation between scores with IRT enables more precise estimation of individual trajectories over time and group effects. The objective of this study was to show the advantages of using IRT scores instead of sum scores when analyzing RCTs.

      Study Design and Setting

      Two studies are presented, a real-life RCT, and a simulation study. Both IRT and sum scores are used to measure the construct and are subsequently used as outcomes for effect calculation.


      The bias in RCT results is conditional on the measurement model that was used to construct the scores. A bias in estimated trend of around one standard deviation was found when sum scores were used, where IRT showed negligible bias.


      Accurate statistical inferences are made from an RCT study when using IRT to estimate construct measurements. The use of sum scores leads to incorrect RCT results.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Clinical Epidemiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Chan A.
        • Matchar D.
        • Tsao M.
        • Harding S.
        • Chiu C.T.
        • Tay B.
        • et al.
        Self-Care for Older People (SCOPE): a cluster randomized controlled trial of self-care training and health outcomes in low-income elderly in Singapore.
        Contemp Clin Trials. 2015; 41: 313-324
        • Dunton G.
        • Liao Y.
        • Dzubur E.
        • Leventhal A.
        • Huh J.
        • Gruenewald T.
        • et al.
        Investigating within-day and longitudinal effects of maternal stress on children's physical activity, dietary intake, and body composition: protocol for the MATCH study.
        Contemp Clin Trials. 2015; 43: 142-154
        • Hofmann S.
        • Carpenter J.
        • Otto M.
        • Rosenfield D.
        • Smits J.
        • Pollack M.
        Dose timing of d-cycloserine to augment cognitive behavioral therapy for social anxiety: study design and rationale.
        Contemp Clin Trials. 2015; 43: 223-230
        • Fox J.-P.
        • Glas C.
        Bayesian modeling of measurement error in predictor variables using item response theory.
        Psychometrika. 2003; 68: 169-191
        • Apeldoorn A.
        • Ostelo R.
        • Van Helvoirt H.
        • Fritz J.
        • De Vet H.
        • Van Tulder M.
        The cost-effectiveness of a treatment-based classification system for low back pain: design of a randomised controlled trial and economic evaluation.
        BMC Musculoskelet Disord. 2010; 11: 58
        • Apeldoorn A.
        • Ostelo R.
        • van Helvoirt H.
        • Fritz J.
        • Knol D.
        • van Tulder M.
        • et al.
        A randomized controlled trial on the effectiveness of a classification-based system for subacute and chronic low back pain.
        Spine. 2012; 37: 1347-1356
        • Fairbank J.
        • Pynsent P.
        The Oswestry Disability Index.
        Spine. 2000; 25: 2940-2952
        • Fairbank J.
        Use and abuse of Oswestry Disability Index.
        Spine. 2007; 32: 2787-2789
        • Fairbank J.
        • Couper J.
        • Davies J.
        • O'Brian J.
        The Oswestry low backpain questionnaire.
        Physiotherapy. 1980; 66: 271-273
        • Marsman M.
        • Maris G.
        • Bechger T.
        • Glas C.
        What can we learn from Plausible Values?.
        Psychometrika. 2016; 81: 274-289
        • von Davier M.
        • Gonzalez E.
        • Mislevy R.
        What are plausible values and why are they useful?.
        IERI Monogr Ser. 2009; 2: 9-36
        • Glas C.
        • Geerlings H.
        • van de Laar M.
        • Taal E.
        Analysis of longitudinal randomized clinical trials using item response models.
        Contemp Clin Trials. 2009; 30: 158-170
        • Asparouhov T.
        • Muthén B.
        Plausible values for latent variables using Mplus.
        2010 (Available at Accessed October 14, 2014)
        • Rubin D.
        The calculation of posterior distributions by data augmentation: comment: a noniterative sampling/importance resampling alternative to the data augmentation.
        J Am Stat Assoc. 1987; 82: 543-546
        • Little R.
        • Rubin D.
        Statistical analysis with missing data.
        Whiley & Sons, Hoboken, New Jersey, 2002
        • Van Buuren S.
        Flexible imputation of missing data.
        CRC press, Boca Raton, 2012
        • Lord F.
        • Novick M.
        • Birnbaum A.
        Statistical theories of mental test scores.
        Addison-Wesley Publishing Company Inc., USA, 1968
        • Samejima F.
        Estimation of latent ability using a response pattern of graded scores.
        Psychometrika Monogr Suppl. 1969; 34: 100
        • Albert J.
        Bayesian estimation of normal ogive item response curves using Gibbs sampling.
        J Educ Behav Stat. 1992; 17: 251-269
        • Gorter R.
        • Fox J.P.
        • Twisk J.
        Why Item Response Theory should be used for longitudinal questionnaire data analysis in medical research.
        BMC Med Res Methodol. 2015; 15: 55
        • Curran P.
        • Muthén B.
        The application of latent curve analysis to testing developmental theories in intervention research.
        Am J Community Psychol. 1999; 27: 567-595
        • Muthén B.
        Latent variable modeling of longitudinal and multilevel data.
        in: Jordan M. Learning in Graphical Models. MIT Press, Cambridge MA, 1997: 453-480
        • Muthén L.
        • Muthén B.
        Mplus: statistical analysis with latent variables. User's Guide, 6th edition.
        Muthén & Muthén, ​Los Angeles, CA, 1998-2010
        • Preacher K.
        • Wichman A.
        • MacCallum R.
        • Briggs N.
        Latent growth curve modeling.
        Sage, Los Angeles, 2008
        • Skrondal A.
        • Rabe-Hesketh S.
        Generalized latent variable modeling: multilevel, longitudinal, and structural equation models.
        CRC Press, Boca Raton, 2004
        • Bollen K.
        • Curran P.
        Latent curve models: a structural equation approach.
        John Wiley & Sons, Hoboken, New Jersey, 2006
        • Fox J.-P.
        • Glas C.
        Bayesian modification indices for IRT models.
        Stat Neerl. 2005; 59: 95-106
        • Held L.
        Simultaneous posterior probability statements from Monte Carlo output.
        J Comput Graph Stat. 2004; 13: 20-35
        • Box G.
        • Tiao G.
        Bayesian inference in statistical analysis.
        John Wiley & Sons, inc., New York, 1992
        • Fox J.-P.
        Multilevel IRT using dichotomous and polytomous response data.
        Br J Math Stat Psychol. 2005; 58: 145-172
        • Artus M.
        • van der Windt D.
        • Jordan K.
        • Hay E.
        Low back pain symptoms show a similar pattern of improvement following a wide range of primary care treatments: a systematic review of randomized clinical trials.
        Rheumatology. 2010; 49: 2346-2356
        • Swartz R.
        • Schwartz C.
        • Basch E.
        • Cai L.
        • Fairclough D.
        • McLeod L.
        • et al.
        The king's foot of patient-reported outcomes: current practices and new developments for the measurement of change.
        Qual Life Res. 2011; 20: 1159-1167
        • Food and Drug Administration
        Guidance for industry use in medical product development to support labeling claims guidance for industry.
        2009 (Available at) (Accessed: June 9, 2016)