Advertisement
Journal Home
Search for

Volume 56, Issue 1, Pages 68-74 (January 2003)


View previous. 9 of 14 View next.

Rasch scoring of outcomes of total hip replacement

Ray FitzpatrickCorresponding Author Informationaemail address, Josephine M Norquista, Jill Dawsonb, Crispin Jenkinsoncd

Received 5 December 2001; received in revised form 17 May 2002; accepted 2 September 2002.

Abstract 

We examined whether there are advantages in terms of outcome assessment of using Rasch methods of scoring the 12-item Oxford Hip Score questionnaire over conventionally summed scores. Data were collected on patients receiving total hip replacement surgery. Three patient groups were created according to surgery type: primary, revision, and re-revision; two groups were created according to satisfaction with surgery: very satisfied and dissatisfied. Analyses were performed to test the relative precision (RP) of Rasch scoring versus conventionally summed scores in discriminating the groups experiencing different types of surgery and level of satisfaction. At the 1-year follow-up, RP ratios favored the Rasch scoring method in both tests of discrimination. Considerable gains in precision were achieved with Rasch scoring methods when groups were compared in a cross-sectional way. Alternative approaches to scoring questionnaires should be investigated to better assess comparisons over time.

Article Outline

Abstract

1. Introduction

2. Methods

2.1. Sample and data collection

2.2. Analysis plan

3. Results

3.1. Rasch analysis

3.2. RP—clinical discrimination

3.3. RP—sensitivity to change

4. Discussion

References

Copyright

1. Introduction 

return to Article Outline

A number of advantages over classical or psychometric approaches to scoring of items in health status and related instruments have been claimed for methods based on the Item Response Theory (IRT).

These include identification of uni-dimensional constructs, additivity of items, and interval level measurement 1, 2. With IRT methods, items are ordered in terms of difficulty, so overlapping items can be identified and eliminated and items can be chosen to measure the full spectrum of a construct 3, 4.

One of the main applications of health status measures is to assess outcomes of health care interventions. For this purpose, measures are required that have maximal signal-to-noise ratios in within-subject changes [5]. Significant effort has been invested in the development of methods to assess responsiveness of measures to optimize this use of health status measures 6, 7. It has been argued that scoring of instruments according to IRT methods may improve responsiveness [8]. Limited empirical evidence has seemed to address this suggestion, one problem being uncertainty through lack of independent evidence regarding the extent of underlying change, which alternative scoring methods are expected to measure.

Outcome assessment in total hip replacement (THR) surgery is a useful setting in which to assess the advantages of IRT methods of scoring health status measures because improvements to health-related quality of life produced by surgery are beyond dispute. By whatever index of responsiveness, THR is associated with greater improvements in domains such as pain and physical function than are observed for many medical interventions 9, 10. On the other hand, there are enormous variations in surgical practice in THR and very little high quality evidence that relates surgical practice to outcome [11]. Thus, in the field of orthopedic surgery, selection of outcome measures on the basis of such indices of responsiveness as size of overall effect size or equivalent statistics alone may be insufficient because indices are normally large. Measures need to be tested against the requirement to detect differences in usually substantial change scores experienced between groups undergoing different surgical practices.

One area of THR in which outcome measurement can be tested is in relation to whether patients are undergoing the procedure for the first time (“primary”) or are having their surgery redone (“revision”) because of failure of the primary THR. Because of inherent complexities of revision surgery, results are not expected to be as successful as for primary surgery, and less favorable results may occur for individuals having to undergo second or further revisions [12]. A second criterion that can be used to assess IRT scoring of outcomes is the patient's overall judgment of surgery in terms of satisfaction. Understanding of patient satisfaction with THR is important clinically and for public policy. Changes in pain and function are important determinants of patient satisfaction with THR 13, 14. There are, therefore, two distinct aspects of THR whereby we may test the advantages of IRT scoring: (i) in relation to groups known to experience forms of THR of different degrees of complexity and (ii) in relation to groups known to differ in their judgment of the success of THR.

The Oxford Hip Score (OHS) was developed specifically to assess outcomes of THR. As a relatively short instrument (12 items), it is particularly appropriate for use by older individuals who most often receive THR. It has been extensively examined in terms of reliability, validity, and responsiveness 15, 16, 17, 18. It has now been widely used to evaluate THR surgery and alternative medical and service interventions for severe hip disease 19, 20, 21, 22.

The purpose of this article is to examine whether there are advantages in terms of outcome assessment of using the Rasch rating scale model [23] of IRT-based methods of scoring the OHS over conventionally summed scores. We adopt the approach of McHorney et al [24] and Raczek et al [25] by using relative precision (RP) to assess the usefulness of the OHS in distinguishing differences in outcomes of groups expected to differ. We focus on examining the RP of the Rasch scoring method to discriminate patient groups expected to differ in terms of pain and physical function after THR surgery.

2. Methods 

return to Article Outline

2.1. Sample and data collection 

Since September 1996, as part of a long-term outcomes assessment program, detailed prospective data have been collected routinely on all patients who undergo primary or revision hip surgery at the Nuffield Orthopaedic Centre, Oxford, UK. The data combined patient completed questionnaires with clinical reported information regarding surgery.

Pre-operative questionnaires were completed routinely while patients attended pre-admission outpatient clinics approximately 2 weeks before surgery. Questionnaires included demographic questions, standard items about general health, questions about pain in other joints, and the 12-item Oxford Hip Score (OHS) questionnaire [16].

Patient post-operative questionnaires and outcome measures were sent out at precisely 1 year after surgery. Patients were asked about any further hip operations that might have occurred since the first hip operation. They were asked whether they were “very pleased,” “fairly pleased,” “not very pleased,” or “very disappointed” with their hip operation and also completed the OHS questionnaire.

The OHS is a validated 12-item questionnaire comprising two subscales: Pain (OHS-P; six items) and Functional Impairment (OHS-FI; six items). Five response categories to each question are summed to produce scores from 6 to 30, for each subscale with higher scores denoting worse pain and functional impairment [15].

From a series of 1424 patients, 1221 (85.7%) received pre-surgical and 1-year follow-up assessments at the cut-off date for this study. A total of 990 (81.1%) fully completed the 1-year follow-up questionnaires. For the purpose of this article, analyses were based on the subset of patients who completed the OHS questionnaire at both time points (n = 891, 73.0%). The mean age of this sample was 67.6 years (standard deviation [SD] 12.6, range 43–92 years). Women made up 60.4% of the sample.

To examine the relative precision of the two scoring methods in clinically discriminating across patients, three mutually exclusive groups were defined according to the type of surgery received: primary surgery (n = 533; 59.8%), first revision surgery (n = 267; 29.9%), and re-revision surgery (n = 86; 9.7%). Information on type of surgery was not available on five patients. Group comparisons were carried out to test the relative precision of the Rasch scoring method versus the summative scoring method in discriminating between the three types of surgery. Patients were also divided into two additional groups according to the level of satisfaction with their hip operation. The satisfied patients (those who responded “very pleased” with surgery) (n = 658; 73.8%) were compared with the dissatisfied patients (those who had given other responses than “very pleased”) (n = 231; 25.9%). Two patients did not answer the question related to satisfaction. Comparisons were performed to determine which scoring method achieved greater relative precision in differentiating between patients' level of satisfaction.

2.2. Analysis plan 

A series of Rasch analyses were performed to test: (i) whether the two subscales (OHS-P and OHS-FI) each formed a uni-dimensional, hierarchical continuum and, thus, whether Rasch-based scoring is appropriate for these scales; and (ii) whether the Rasch-based scoring procedure improved precision of the OHS scores over traditional Likert-based scoring. Rasch analysis can identify items that are redundant, items that do not fit the presumed hierarchy, and gaps in the scale. To create the interval scale, Rasch analysis estimates person-ability and item difficulty for a set of items 1, 23. The basic assumption is that the probability of an individual's success or failure on a particular item depends on the person-ability and the item difficulty. Redundant items, noise, and outliers can be identified by the “misfit” statistic. Items with a very low or high misfit statistic are redundant (very low) or not measuring the underlying construct (very high).

To maximize comparability, OHS raw scores and Rasch measures were linearly transformed from their original scale to a 0 to 100 metric with the least symptoms score (or measure) set equal to 0 and the most severe person score (or measure) set equal to 100 [25]. To maximize the potential differences in scoring methods, the method of known-groups validity was used to test the RP in discriminating the impact of the different surgery types and level of satisfaction, and scores were compared in terms of their RP. This depends on two factors: (i) the degree to which a scoring method separates the groups or test occasions being compared and (ii) the within-group or within-occasion variance [26]. RP is the ratio of pair-wise F statistics (Rasch OHS divided by the Likert OHS). The RP estimates indicate, in proportional terms, how much more or less precise a scoring method is in relation to the standard, which in this case is the Likert scoring method. Independent t tests were used to test for differences in group means, and all F statistics (t2) with P values <0.05 were considered statistically significant. The RP estimates were also used to determine the influence of scoring methods on change scores. Paired t tests were calculated to test within-group differences between pre-operative and 1-year post-operative data. In all analyses, the sample size between scoring methods was held constant. The hypotheses to be tested were that: (i) the RP estimates would favor the Rasch scoring method in discriminating between surgery types and (ii) the RP estimates would favor the Rasch scoring method in discriminating satisfied from dissatisfied patients.

Confidence intervals (CI) for the relative precision statistics were obtained using the bootstrap algorithm [26]. A total of 1000 bootstrap samples with replacement were generated from each patient-group comparison. F statistics and RP values were calculated for each resampling, which provided an estimate of the distribution for each RP. The 25th and 75th values of the RP distribution identified the 95% CI.

Data were analyzed using SPSS (Windows, version 8.0) for general descriptive statistics and STATA for the calculation of the RP CIs. For Rasch analyses, the Winsteps program was used [27].

3. Results 

return to Article Outline

Raw OHS mean scores (SD) for the surgery-type patient groups are summarized in Table 1. Post-operative raw OHS scores were consistently lower (indicating reduced pain and functional impairment) compared with the pre-operative scores across all groups. All groups experienced substantial improvement as reflected in effect size statistics. However, the amount of change in raw score between the two time points differed when comparing groups receiving different types of surgery. For example, at the 1-year follow-up assessment, primary surgery patients had a raw OHS-P mean score of 9.11 (14.01 mean change from the pre-operative assessment), whereas re-revision surgery patients had a raw OHS-P mean score of 14.97 (8.21 mean change from the pre-operative assessment). This indicates that there was a greater decrease in the amount of pain for the primary surgery patients compared with the amount of decrease obtained from the re-revision surgery patients. The same can be said for the functional impairment subscale. Effect size for change in pain and functional improvement were smaller for groups expected to obtain less improvement (ie, those receiving re-revision surgery and those dissatisfied with their surgery).

Table 1.

Raw subscale score means by surgery type and satisfaction

Mean (SD)
Scale: Patient groupPre-operative1-yr post-operativeMean differenceaEffect sizeb
OHS-P
Primary23.12 (4.13)9.11 (3.84)14.01 (5.14)3.39
Revision22.74 (4.35)12.31 (5.89)10.43 (6.34)2.40
Re-revision23.17 (4.77)14.97 (6.46)8.21 (6.64)1.72
Satisfied22.83 (4.34)8.66 (3.14)14.17 (4.95)3.26
Dissatisfied23.55 (4.00)16.38 (5.77)7.17 (5.87)1.79
OHS-FI
Primary19.45 (4.63)9.99 (4.29)9.47 (5.07)2.05
Revision18.87 (5.01)12.94 (5.47)5.93 (5.47)1.18
Re-revision20.02 (4.38)15.85 (5.38)4.17 (5.28)0.95
Satisfied19.20 (4.77)9.89 (4.14)9.31 (5.07)1.95
Dissatisfied19.81 (4.61)15.92 (5.23)3.88 (4.94)0.84

Abbreviations: SD, standard deviation; OHS-P, Oxford Hip Score—Pain; OHS-F, Oxford Hip Score—Function Impairment.

a

Difference between pre-operative and 1-year post-operative scores. All differences were significant (P < 0.05).

b

Calculated dividing the mean difference by the SD at baseline.

3.1. Rasch analysis 

The six-item pain subscale (OHS-P) and the six-item functional impairment subscale (OHS-FI) were examined by Rasch rating scale analysis. Apart from one OHS-P subscale item (“Have you been limping when walking, because of your hip?”), the item fit statistics for the two subscales (OHS-P and OHS-FI) at both time points were between the reasonable range suggested by Wright et al for “rating scale” questionnaires of 0.6 to 1.4 (Table 2) [28]. Inter-item separation was less than 0.15 logits for some items located in the middle of the difficulty range, indicating overlap between these items.

Table 2.

Item measure, SE, and INFIT statistics ordered by pre-operative misfit values

Scale/itemPre-operativePost-operative
MeasureSEINFIT MNSQMeasureSEINFIT MNSQ
OHS-P
Limping−1.390.061.75a−1.000.051.68a
Pain in bed0.260.051.280.280.061.08
Stabbing pain0.700.041.170.600.061.02
Level of pain−0.960.060.86−0.330.050.72
Trouble standing0.980.040.660.440.060.72
Trouble working0.400.050.660.010.050.67
OHS-FI
Trouble shopping−0.420.051.36−0.420.051.34
Able to walk0.060.041.360.740.061.31
Put on sock−0.430.051.00−0.740.051.11
Trouble washing0.630.050.940.430.050.85
Climb stairs0.060.050.71−0.100.050.75
Trouble with transport0.100.050.600.090.050.66

Abbreviations: SE, standard error; OHS-P, Oxford Hip Score—Pain; OHS-FI, Oxford Hip Score—Function Impairment.

a

INFIT MNSQ outside reasonable rating scale range of 0.6–1.4.

To check whether the item measures were stable across surgery groups and over time, separate Rasch analyses were conducted for each group at each time point. Results of these analyses (not reported) showed good overall fits of the model for each of three surgery-type groups at both time points. Only one item (“Have you been limping when walking, because of your hip?”) showed poor fit (INFIT > 1.4).

Table 3 presents a comparison of the OHS raw scores, the OHS transformed Likert summative scores (0–100), the OHS transformed Rasch scores (0–100), and the corresponding Rasch score standard errors for both subscales. Score intervals for Rasch values at the extreme of the range were larger than the ones in the middle of the range. For example, a 1-point change (from 6 to 7) on the OHS raw scores corresponded to a 4-point change on the OHS transformed 0 to 100 raw scores, a 12-point change on the OHS-P transformed 0 to 100 Rasch scores, and a 7-point change on the OHS-FI transformed 0 to 100 Rasch scores. A similar interval difference occurred at the opposite end of the scale range. In the middle of the range, however, smaller and equivalent intervals were observed for Rasch and raw transformed scores. The table also shows the Rasch standard errors, which are lower toward the center of the score distribution.

Table 3.

Comparison of Oxford Hip Score subscale raw scores, Likert and Rasch transformed scores

OHS-P and OHS-FI raw scoreOHS-P and OHS-FI 0–100 raw scoreOHS-P Rasch 0–100 score (SE)OHS-FI Rasch 0–100 score (SE)
600 (18.0)0 (18.5)
7412 (9.9)7 (10.5)
8820 (7.3)15 (7.9)
91324 (6.4)20 (6.8)
101728 (5.8)24 (6.2)
112131 (5.5)28 (5.9)
122534 (5.2)31 (5.7)
132937 (5.0)34 (5.6)
143339 (4.8)37 (5.6)
153842 (4.7)41 (5.6)
164244 (4.7)44 (5.6)
174646 (4.7)47 (5.7)
185049 (4.7)50 (5.7)
195451 (4.8)53 (5.6)
205853 (5.0)57 (5.6)
216356 (5.1)60 (5.5)
226759 (5.3)63 (5.5)
237162 (5.6)66 (5.5)
247565 (5.8)69 (5.6)
257969 (6.2)72 (5.7)
268373 (6.6)76 (6.1)
278878 (7.2)80 (6.7)
289284 (8.2)85 (7.8)
299693 (10.8)93 (10.5)
30100100 (18.7)100 (18.5)

Abbreviations: SE, standard error; OHS-P, Oxford Hip Score—Pain; OHS-FI, Oxford Hip Score—Function Impairment.

3.2. RP—clinical discrimination 

The RP estimate comparisons between the three groups receiving the different type of surgery are summarized in Table 4. Pre-operatively, the F statistics were not statistically significant at conventional probability levels for the group comparisons; therefore, the RP estimates were not calculated. At the 1-year follow-up, the Rasch scoring method achieved high RP in discriminating across all group comparisons. Specifically, the gain in precision on the OHS-P subscale was equal to 10% in discriminating between primary surgery and revision surgery, 25% in discriminating between revision surgery and re-revision surgery, and 73% in discriminating between primary surgery and re-revision surgery. For the OHS-FI subscale, the precision gained from using the Rasch scoring method was equal to 12%, 33%, and 52% for the three group comparisons, respectively.

Table 4.

Mean score differences of Oxford Hip Score Pain and Functional Impairment subscales between patients differing in surgery type

1 y post-operative dataa
Mean difference (SE)F statisticbRelative precision (95% CI)
1 versus 22 versus 31 versus 31 versus 22 versus 31 versus 31 versus 22 versus 31 versus 3
OHS-P Likert−13.32 (1.66)−11.07 (3.12)−24.39 (2.99)64.7212.5866.731.001.001.00
OHS-P Rasch−11.70 (1.32)−9.49 (2.39)−21.19 (1.97)70.9915.82115.331.10 (1.05–1.40)1.25 (1.05–2.52)1.73 (1.40–2.05)
OHS-FI Likert−12.31 (1.60)−12.13 (2.82)−24.44 (2.54)59.5418.5592.661.001.001.00
OHS-FI Rasch−10.53 (1.29)−8.99 (1.81)−19.51 (1.65)66.8924.66140.561.12 (0.97–1.30)1.33 (0.91–2.18)1.52 (1.24–1.83)

Abbreviations: SE, standard error; CI, confidence interval; OHS-P, Oxford Hip Score—Pain; OHS-FI, Oxford Hip Score—Function Impairment.

a

1 = primary (n = 533); 2 = revision (n = 267); 3 = re-revision (n = 86).

b

F statistic not significant for all baseline comparisons; therefore, relative precision not computed.

At the 1-year follow-up, the majority of the patients (658; 74%) were satisfied (ie, “very pleased”) with their hip surgery. Table 5 reports the RP calculated to determine whether or not the Rasch scoring method discriminated better between the satisfied compared with the dissatisfied groups at the follow-up assessment. The Rasch scoring method achieved 39% and 34% gain in relative precision (for OHS-P and OHS-FI, respectively) in discriminating patient satisfaction groups.

Table 5.

Mean score between patient satisfaction groups tested 1 year post-operatively

Scoring methodPatient groupsaOHS-POHS-FI
Mean (SE)Mean difference (SE)F statisticRelative precision (95% CI)Mean (SE)Mean difference (SE)F statisticRelative precision (95% CI)
LikertSatisfied11.08 (0.51)32.23 (1.66)374.931.0016.22 (0.67)25.16 (1.59)251.571.00
Dissatisfied43.31 (1.58) 41.38 (1.44)
RaschSatisfied19.88 (0.60)26.53 (1.16)521.671.39 (1.25–1.51)25.60 (0.64)20.21 (1.10)337.021.34 (1.18–1.53)
Dissatisfied46.41 (1.00) 45.81 (0.89)

Abbreviations: OHS-P, Oxford Hip Score—Pain; OHS-FI, Oxford Hip Score—Function Impairment; SE, standard error; CI, confidence interval.

a

Satisfied, n = 658; dissatisfied, n = 231.

3.3. RP—sensitivity to change 

Table 6 summarizes the mean change scores between patient groups tested pre-operatively and 1-year post-operatively for which RP estimates were obtained. The RP estimates showed modest gains in the Rasch scoring method in discriminating between surgery type groups in the longitudinal data. The greater gain was obtained when the primary surgery group was compared with the re-revision surgery group. The gains in RP when using the Rasch scoring method for these two extreme surgical groups were equal to 17% for the OHS-P subscale and 16% for the OHS-FI subscale. Similar gains were obtained when patient satisfaction groups were compared over time (Table 7). A 21% and 19% gain in RP was achieved when using the Rasch scoring method on the OHS-P and OHS-FI subscales, respectively.

Table 6.

Mean change scores between patients tested pre-operatively and 1 year post-operativelya

Mean change (SE) for each groupMean change (SE) between groupsF statisticRelative precision (95% CI)
1231 versus 22 versus 31 versus 31 versus 22 versus 31 versus 31 versus 22 versus 31 versus 3
OHS-P Likert58.40 (0.93)43.48 (1.62)34.23 (2.99)14.92 (1.86)9.25 (3.32)24.17 (3.13)64.027.7759.691.001.001.00
OHS-P Rasch43.62 (0.89)30.77 (1.31)23.57 (2.23)12.85 (1.57)7.19 (2.63)20.05 (2.40)67.177.4669.761.05 (0.85-1.23)0.96 (0.55-2.10)1.17 (0.90-1.46)
OHS-FI Likert39.48 (0.92)24.71 (1.40)17.41 (2.38)14.77 (1.63)7.30 (2.81)22.07 (2.47)82.266.7779.731.001.001.00
OHS-FI Rasch29.03 (0.81)16.49 (1.09)11.37 (1.65)12.55 (1.38)5.11 (2.14)17.66 (1.84)82.686.6692.331.01 (0.84-1.15)0.98 (0.42-2.20)1.16 (0.86-1.73)

Abbreviations: Se, standard error; CI, confidence interval; OHS-P, Oxford Hip Score—Pain; OHS-FI, Oxford Hip Score—Function Impairment.

a

1 = Primary (n = 533); 2 = Revision (n = 267); 3 = Re-revision (n = 86).

Table 7.

Mean change score between patient satisfaction groups tested pre-operatively and 1 year post-operatively

Scoring methodPatient groupsaOHS-POHS-FI
Mean change (SE)Mean difference (SE)F statisticRelative precision (95% CI)Mean change (SE)Mean difference (SE)F statisticRelative precision (95% CI)
Likert;Satisfied59.11 (0.81)29.21 (1.80)263.251.0038.82 (0.82)22.63 (1.61)198.361.00
Dissatisfied29.89 (1.61) 16.19 (1.35)
RaschSatisfied44.04 (0.79)24.20 (1.35)318.231.21 (1.07–1.36)28.34 (0.73)18.30 (1.19)236.451.19 (1.02–1.30)
Dissatisfied19.84 (1.10) 10.04 (0.95)

Abbreviations: OHS-P, Oxford Hip Score—Pain; OHS-FI, Oxford Hip Score—Function Impairment; SE, standard error; CI, confidence interval.

a

Satisfied, n = 658; dissatisfied, n = 231.

4. Discussion 

return to Article Outline

A major application of health status measures is to detect differences of outcomes between groups of patients in randomized controlled trials and related evaluative research designs. Trials are normally conducted to detect small but worthwhile gains of alternative interventions. Considerable benefits would result from methods of scoring health status measures that might improve the ability accurately to assess whether differences in outcomes exist between interventions [8]. We set out to examine potential gains of Rasch-based scoring. The advantages of this new approach to measurement have largely been confined to generic instruments such as SF-36 and to populations where the extent of underlying change that might be expected in the population is unclear 24, 25. We report a study designed to examine potential gains of Rasch-based scoring in an orthopedic context in which substantial health gain is the norm. The data are based on a longitudinal study of patients undergoing THR surgery in which some loss to follow-up occurred in assessing outcome. This may raise an issue about overall generalizability. However, no obvious bias limits the comparison of the two scoring methods when based only on the responders' data.

The OHS is intended to be used in patients undergoing THR, from which substantial improvement occurs for the majority of patients. The challenge for measurement in this field is to detect differences between groups in the extent of their improvement to evaluate the substantial variations in surgical practice that prevail in THR. Rasch scoring of the OHS questionnaire substantially improved the ability to discriminate between groups as assessed by gains in relative precision of Rasch compared with Likert summative scoring in two contexts where some differences in outcomes were expected between groups. First, gains in relative precision were observed for Rasch scoring when patients undergoing “simple” primary THR were compared with patients experiencing more difficult revision or re-revision surgery. Similar gains in precision were found for Rasch scoring when comparing patients who were very satisfied with surgery 1 year later compared with patients who were not satisfied. The gains in relative precision were substantially greater than those observed for Rasch scoring of the SF-36 in patients, with various chronic health problems experiencing less marked changes in health status over time [24].

Gains in relative precision from Rasch scoring to distinguish outcomes in patients undergoing THR surgery of varying levels of difficulty were greater using outcome scores at follow-up (range of relative precision 10% to 73%) compared with gains obtained from change scores; that is, differences between pre-surgical and follow-up health status (range of relative precision −4% to 17%). Similarly, gains in relative precision from Rasch scoring in distinguishing patients satisfied with outcomes after 1 year from other patients were greater using 1-year follow-up data (34% and 39% increase in relative precision) than in using differences between pre-surgical and follow-up health status (19% and 21% increases).

It has been argued that gains in relative precision may depend on the distribution of the sample on the underlying construct being measured. For example, differential gains at the extreme compared with the middle of the distribution may be obtained [29]. Gains in relative precision may occur through greater differences between groups in scores or through reduced standard errors. As noted in other analyses of the advantages of Rasch scoring, reduced standard errors played the greater role in improving sensitivity to change 24, 25. For both subscales of OHS, and whether examining relative precision in terms of follow-up scores alone or change scores from before surgery, the gains in relative precision obtained by Rasch scoring arise from smaller standard errors. This may be a general feature of Rasch scoring that reduces the distances between scores at the middle of the distribution of scores. Like the Physical Function scale of the SF-36, the OHS was shown by Rasch analysis to have several items that clustered in the middle range of the scale 24, 30. As shown in Table 3, the Rasch scoring method provides differential estimates of standard error across the continuum with lower values toward the middle of the score distribution. This means that higher precision is obtained in the middle of the OHS scale because of the clustering of items.

There is limited and mixed evidence available regarding the advantages from Rasch scoring for sensitivity to change. Only modest gains were observed for the Physical Function scale of the SF-36 when Rasch-rescored [24]. When the SF-36 was compared with a Rasch-scored version in change over time in patients with epilepsy, the Rasch version was more sensitive to change against some, but not all, external criteria [31]. A comparison of the conventional Health Assessment Questionnaire with a Rasch-based version in longitudinal data on patients with rheumatoid arthritis found no gain in sensitivity to change of the Rasch-based version [32].

The evidence from the current study suggests that in some situations there may be substantial gains in sensitivity to change of outcome measures from Rasch-based scoring. Reduction of the standard errors helps to ensure that the underlying construct is more closely measured. However, these advantages need to be explored in a wider range of medical and surgical populations, experiencing varying degrees of change and direction of change, as reflected in a range of types of instrument. Further research should be aimed toward analyzing within-subject change scores through, for example, the calculation of the reliable change index [33] or testing the Rasch-based scoring method on patients undergoing small clinical changes. There is a range of approaches to applying Rasch analysis to data containing repeated measures that need to be evaluated 34, 35. There are a large number of challenges to face to fully assess the contribution of Rasch methods to appropriate measurement of outcomes of health care.

References 

return to Article Outline

1. 1 Wright BD, Masters GN. Rating scale analysis. Chicago: MESA Press;; 1982;.

2. 2 Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must be interval. Arch Phys Med Rehabil. 1989;70:857–860. MEDLINE

3. 3 Granger CV, Deutsch A, Linn R. Rasch analysis of the functional independence measure (FIM) mastery test. Arch Phys Med Rehabil. 1998;79:52–57. Abstract | Full-Text PDF (622 KB) | CrossRef

4. 4 MacKnight C, Rockwood K. Rasch analysis of the hierarchical assessment of balance and mobility (HABAM). J Clin Epidemiol. 2000;53:1224–1242.

5. 5 Guyatt GH, Kirshner B, Jaeschke R. Measuring health status (what are the necessary measurement properties?). J Clin Epidemiol. 1992;45:1341–1345. MEDLINE | CrossRef

6. 6 Beaton D, Bombardier C, Katz JN, et al.  Looking for important change/differences in studies of responsiveness. J Rheumatol. 2001;28:400–405.

7. 7 Husted J, Cook R, Farewell VT, et al.  Methods for assessing responsiveness (a critical review and recommendations). J Clin Epidemiol. 2000;53:459–468. Abstract | Full Text | Full-Text PDF (89 KB) | CrossRef

8. 8 Hays RD, Morales L, Reise S. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38:S28–S42.

9. 9 Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care. 1990;28:632–642. MEDLINE | CrossRef

10. 10 Shields RK, Enloe LJ, Leo KC. Health related quality of life in patients with total hip or knee replacement. Arch Phys Med Rehabil. 1999;80:572–579. Abstract | Full-Text PDF (1139 KB) | CrossRef

11. 11 Fitzpatrick R, Shortall E, Sculpher M, et al.  Primary total hip replacement surgery (a systematic review of outcomes and modelling of cost-effectiveness associated with different prostheses). Health Technol Assess. 1998;2:1–64. MEDLINE

12. 12 Robinson AH, Palmer CR, Villar RN. Is revision as good as primary hip replacement? A comparison of quality of life. J Bone Joint Surg Br. 1999;81:42–45. CrossRef

13. 13 Espehaug B, Havelin LI, Engesaeter LB, et al.  Patient satisfaction and function after primary and revision total hip replacement. Clin Orthop. 1998;351:135–148.

14. 14 Bayley KB, London MR, Grunkmeier GL, et al.  Measuring the success of treatment in patient terms. Med Care. 1995;33(Suppl 4):AS226–AS235. MEDLINE

15. 15 Dawson J, Fitzpatrick R, Carr A, et al.  Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br. 1996;78:185–190.

16. 16 Dawson J, Fitzpatrick R, Murray D, et al.  Comparison of measures to assess outcomes in total hip replacement surgery. Qual Health Care. 1996;5:81–88. MEDLINE

17. 17 Dawson J, Fitzpatrick R, Murray D, et al.  The problem of ‘noise’ in monitoring patient-assessed outcomes (generic, disease-specific and site-specific instruments for total hip replacement). J Health Serv Res Policy. 1996;1:224–231. MEDLINE

18. 18 Fitzpatrick R, Dawson J. Health-related quality of life and the assessment of outcomes of total hip replacement surgery. Psychology Health. 1997;12:793–803.

19. 19 Dawson J, Jameson-Shortall E, Emerton M, et al.  Issues relating to long-term follow-up in hip arthroplasty surgery (a review of 598 cases at 7 year comparing prostheses using revision rates, survival analysis, and patient-based measures). J Arthroplasty. 2000;15:710–717. Abstract | Full Text | Full-Text PDF (50 KB) | CrossRef

20. 20 Fitzpatrick R, Morris R, Hajat S, et al.  The value of short and simple measures to assess outcomes for patients of total hip replacement surgery. Qual Health Care. 2000;9:146–150. MEDLINE | CrossRef

21. 21 McMurray R, Heaton J, Sloper P, et al.  Measurement of patient perceptions of pain and disability in relation to total hip replacement (the place of the oxford hip score in mixed methods). Qual Health Care. 1999;8:228–233. MEDLINE

22. 22 Shepperd S, Harwood D, Jenkinson C, et al.  randomised controlled trial comparing hospital at home care with inpatient hospital care. I (Three month follow up of health outcomes). BMJ. 1998;316:1786–1791.

23. 23 Wright BD, Stone MH. Best test design. Chicago: Mesa Press;; 1979;.

24. 24 McHorney CA, Haley SM, Ware-JE . Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10) (II. Comparison of relative precision using Likert and Rasch scoring methods). J Clin Epidemiol. 1997;50:451–461. Abstract | Full-Text PDF (1174 KB) | CrossRef

25. 25 Raczek AE, Ware JE, Bjorner JB, et al.  Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries (results from the IQOLA project). J Clin Epidemiol. 1998;51:1203–1214. Abstract | Full Text | Full-Text PDF (368 KB) | CrossRef

26. 26 Smith EV. Understanding Rasch measurement (metric development and score reporting in Rasch measurement). J Appl Meas. 2000;1:303–326. MEDLINE

27. 27 Linacre JM, Wright BD. A user's guide to WINSTEPS (Rasch model computer program). Chicago: MESA Press;; 2000;.

28. 28 Wright BD, Linacre JM, Gustafson J-E, et al.  Reasonable mean-square fit values. Rasch Meas Trans. 1994;8:370.

29. 29 Cella D, Chang CH. A discussion of item response theory and its applications in health status assessment. Med Care. 2000;38(Suppl 9):66–72.

30. 30 Stucki G, Daltroy L, Katz JN, et al.  Interpretation of change scores in ordinal clinical scales and health status measures (the whole may not equal the sum of the parts). J Clin Epidemiol. 1996;49:711–717. Abstract | Full-Text PDF (795 KB) | CrossRef

31. 31 Birbeck GL, Kim S, Hays RD, et al.  Quality of life measures in epilepsy. how well can they detect change over time?. Neurology. 2000;54:1822–1827. MEDLINE

32. 32 Wolfe F. Which HAQ is best? A comparison of the HAQ, MHAQ and RA-HAQ, a difficult 8 item HAQ (DHAQ), and a rescored 20 item HAQ (HAQ20) (analyses in 2,491 rheumatoid arthritis patients following leflunomide initiation). J Rheumatol. 2001;28:982–989.

33. 33 Prieto L, Roset M, Badia X. Rasch measurement in the assessment of growth hormone deficiency in adult Patients. J Appl Meas. 2001;2:48–64. MEDLINE

34. 34 Chang WC, Chan C. Rasch analysis for outcomes measures (some methodological considerations). Arch Phys Med Rehabil. 1995;76:934–939. Abstract | Full-Text PDF (683 KB) | CrossRef

35. 35 Wolfe EW, Chiu CW. Measuring pretest-posttest change with a Rasch rating scale model. J Outcome Meas. 1999;3:134–161.

a Department of Public Health, Institute of Health Sciences, University of Oxford, Headington, Oxford, OX3 7LF, United Kingdom

b OCHRAD, School of Healthcare, Oxford Brookes University, 44 London Rd., Oxford OX3 7PD, United Kingdom

c Department of Public Health, Health Services Research Unit, Institute of Health Scences, University of Oxford, Headington, Oxford OX3 7LF, United Kingdom

d Picker Institute Europe, King's Mead House, Oxpens Rd., Oxford OX1 1RX, United Kingdom

Corresponding Author InformationCorresponding author. Tel: +44 01865 226728; fax: +44 01865 226720.

PII: S0895-4356(02)00532-2


View previous. 9 of 14 View next.