Advertisement

Retrospectively patient-reported pre-event health status showed strong association and agreement with contemporaneous reports

Open AccessPublished:September 10, 2016DOI:https://doi.org/10.1016/j.jclinepi.2016.09.002

      Abstract

      Objective

      The unpredictability of the occurrence of illnesses and injuries leading to most emergency admissions to hospital makes it impossible prospectively to collect preadmission patient-reported outcome measures (PROMs). Our aims were to review the evidence for using retrospective PROMs to determine pre-event health status and the validity of using general population norms instead of retrospective PROMs.

      Study Design and Setting

      Searches of Medline, PsycINFO, Embase, Global Health, and Health Management information. Six studies met the inclusion criteria for the first aim, and 11 studies addressed the second aim. Narrative syntheses were conducted.

      Results

      Strong associations were found between retrospective and contemporary PROMs in 21 of 30 comparisons (correlation coefficients over 0.68) and 20 of 24 showed strong agreement for continuous measures (intraclass correlations over 0.75). Categorical measures revealed only fair to moderate levels of agreement (kappa 0.3–0.6). Associations were stronger for indices than for individual items and for shorter time intervals. The direction of differences was inconsistent. Retrospective PROMs reported by elderly patients were similar to the general population but younger adults had been healthier.

      Conclusion

      Retrospective collection offers a means of assessing PROMs in unexpected emergency admissions. However, further research is needed to establish the best policy for their use.

      Keywords

      What is new?

        Key findings

      • There is a strong association between patient-reported outcome measures (PROMs) collected retrospectively and contemporaneous collection among patients undergoing elective surgery.
      • Agreement is also strong for PROMs that are continuous measures but only fair to moderate for categorical measures.
      • Retrospectively collected data suggest that young adult trauma patients are healthier than population norms. The reverse may be true for older patients admitted for medical conditions.

        What this adds to what was known?

      • Retrospective collection offers a means of assessing patient-reported outcomes in unexpected emergency admissions.

        What is the implication and what should change now?

      • Further methodological research is needed to establish the best policy for their use.

      1. Introduction

      The growing acceptance of the importance of patients' views of their outcome when evaluating interventions and assessing the quality of services means that it is necessary to devise ways in which accurate patient-reported outcome measures (PROMs) can be obtained (referred to as PROs in the United States) [
      • Black N.
      Patient reported outcome measures could help transform healthcare.
      ]. PROMs are self-completed questionnaires where patients are asked to report their own state of health (multidimensional symptoms, functional status) and health-related quality of life (HRQL) at one point in time. PROMs can be categorized as generic (e.g., EuroQuol-5D, Short Form [SF]-36) or disease specific (Oxford Hip Score or Western Ontario and McMaster Osteoarthritis Index [WOMAC]). Generic PROMs capture broad domains on function or HRQL, can be converted into utility scores, and provide the means to compare between conditions and treatments. Disease-specific PROMs have greater sensitivity by incorporating aspects of function and HRQL specific to that condition [
      • Black N.
      Patient reported outcome measures could help transform healthcare.
      ]. By comparing measurements before and after a healthcare intervention, the outcome of care can be determined.
      Emergency admissions make up 34% of hospital admissions in England [

      Health & Social Care Information Centre. Hospital Episode Statistics 2013-14. Available at http://www.hscic.gov.uk/hes. Accessed October 7, 2015.

      ]. They can be categorized as either a largely unexpected acute event, such as an acute myocardial infarction, stroke, or injury (about 70% of all emergency admissions) or as an exacerbation of an existing long-term conditions as occur in conditions such as diabetes or chronic obstructive pulmonary disease. Although these are not a clear-cut dichotomy, the two categories present different challenges when using PROMs. Unlike elective admissions when a PROM can be collected before treatment to capture the baseline health at the time (a contemporary PROM), for unexpected emergency admissions, this is not possible (This need not be a problem for emergency admissions due to exacerbations of long-term conditions, such as chronic obstructive pulmonary disease, when PROMs could be collected as part of their routine clinical management, i.e., a contemporary PROM). Therefore, for unexpected admissions, other methods must be used to assess patients' preadmission baseline health status.
      There are two possible approaches. First, there is the use of retrospective PROMs, in which patients are asked to recollect (after their unexpected emergency event, such as an acute myocardial infarction) what their health status and quality of life was like just before the emergency event. This takes the place of contemporaneous collection before the event that can be done when considering planned elective treatments such as hip replacements. Retrospective self-reporting has been extensively used in etiological case–control studies and in cross-sectional surveys [
      • Hennekens C.H.
      • Buring J.E.
      Epidemiology in medicine.
      ] in which respondents are asked to recall characteristics of their health over a specified time frame which may be short (e.g., preceding week) or long (e.g., past year).
      Second, and much cheaper than retrospective reporting, is to use age–sex standardized PROMs which have been collected from the general population (or an appropriate comparison group) as part of a cross-sectional survey, as a surrogate measure of a patients' pre-event baseline health [
      • McKenzie E.
      Measuring disability and quality of life postinjury.
      ]. The use of population norms assumes that patients experiencing an emergency admission are typical of the wider population. This assumption could lead to an overestimate or underestimate of the impact of a health care intervention. If patients are in fact healthier at baseline than the general population (as might be the case when studying recovery from trauma that occurred while undertaking a fitness based sport such as rock-climbing), using the population norm as a surrogate baseline could lead to an “overestimate” of the treatment effect. On the other hand if patients were in worse health than their peers beforehand (as might be expected for those suffering a heart attack), an “underestimation” of the treatment effect will be observed.
      Although there has been no review of the strength of association and of agreement between these two approaches in emergency admissions, two systematic reviews have considered other aspects of recall. One considered the length of recall periods for PROMs in clinical trials and concluded that the optimum depended on two broad categories of factors: characteristics of the phenomenon being recalled (such as how recently it had occurred, its attributes, its complexity) and the context of the recalled phenomenon (such as its salience, the patient's mood) together with the nature of the topic [
      • Stull D.E.
      • Leidy N.K.
      • Parasuraman B.
      • Chassany O.
      Optimal recall periods for patient-reported outcomes: challenges and potential solutions.
      ]. The second review concluded that recall bias is a concern with PROMs and called for more research to understand and identify situations where the use of recall is acceptable [
      • Schmier J.K.
      • Halpern M.T.
      Patient recall and recall bias of health state and health status.
      ].
      Our aims were to review systematically the scientific evidence on (1) the extent of association and agreement between PROMs collected retrospectively and contemporaneously to determine pre-event health status and HRQL and (2) the validity of using general population norms for determining the pre-event health status and HRQL of people with an unexpected emergency admission to hospital.

      2. Study design and settings

      2.1 Literature search

      A search was conducted on studies either (1) comparing retrospective and contemporary PROMs (health status, symptoms, functional status, HRQL) or (2) comparing retrospective PROMs and population norms. For inclusion, studies had to be in English; involve self-completed questionnaires; have a recall period of no more than 6 months. In addition, for comparisons of retrospective and contemporary PROMs, studies had to include a quantitative estimation of the strength of association (Pearson or Spearman rank correlation) or agreement (intraclass correlation coefficient or kappa score). No additional analyses were undertaken to determine missing correlations or levels of agreement.
      Our focus was on methods for estimating patients' pre-event health or HRQL that could be used to determine the extent to which treatment restored them to their previous state of health. Many studies ask patients themselves to assess the extent of change in their health (single transitional items) [
      • Damiano A.M.
      • Pastores G.M.
      • Ware Jr., J.E.
      The health-related quality of life of adults with Gaucher's disease receiving enzyme replacement therapy: results from a retrospective study.
      ,
      • Guyatt G.H.
      • Norman G.R.
      • Juniper E.F.
      • Griffith L.E.
      A critical look at transition ratings.
      ], but this is a different methodological approach to that of comparing assessments at two points in time and were therefore excluded from this review.
      Five databases were searched: Medline, PsycINFO, Embase, Global Health, and Health Management information. A free-text search strategy was used as subject headings were too broad and nonspecific for the research question. The detailed concepts, keywords, and search terms are summarized in Table 1, and the complete search strategy is summarized in Table 2. A forward and backward snowballing strategy was used to complement the free-text search.
      Table 1Literature search: concepts, keywords, and search terms
      Search terms
      ConceptsRetrospectivePopulation normsPatient reportedOutcomes
      KeywordsRetrospective

      Recall

      Historical

      Bias

      Recollected
      Population norm$Self-report$

      Patient report$

      Patient recall$

      Self-recall$
      Outcome$

      Quality * life H?Q?L

      EQ-5D function$

      SF-36

      Health status

      Symptom$
      Abbreviations: EQ-5D, EuroQuol-5D; SF-36, Short Form–36.
      Table 2Search strategy
      • 1.
        Retrospective or recall or historical or recollected
      • 2.
        Bias
      • 3.
        Population norm$
      • 4.
        Self-report$ or patient report$ or patient recall$ or self-recall$
      • 5.
        Outcome$ or quality * life or H?Q?L or EQ-5D or function$ or SF-36 or health status or symptom$
      • 6.
        1 OR 2 OR 3
      • 7.
        6 ADJ5 4
      • 8.
        7 ADJ10 5
      • 9.
        Limit 8 to (humans)
      Combined search string: ((retrospective or recall or historical or bias or population norms or recollected) adj5 ((self-report$ or patient report$ or patient recall$ or self-recall$) adj10 (outcome$ or quality * life or H?Q?L or EQ-5D or function$ or SF-36 or health status or symptom$))).mp.
      Abbreviations: EQ-5D, EuroQuol-5D; SF-36, Short Form–36.
      Identified articles were exported to a reference manager (Mendeley Desktop version 1.13) and duplicates removed. The title and abstracts were screened by one author (E.K.) to assess suitability. Studies in children, adolescents, carer proxies, and those with cognitive impairments were excluded. The remaining articles were read, and forwards and backwards searching of references was conducted (Fig. 1).
      Fig. 1
      Fig. 1Search results. PRISMA 2009 flow diagram. PROMs, patient-reported outcome measures.
      From Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 2009;6(7):e1000097; http://dx.doi.org/10.1371/journal.pmed1000097.

      2.2 Quality appraisal

      For studies comparing retrospective and contemporary PROMs, their methodological quality was appraised by one author (E.K.) using five relevant items selected from the Quality Appraisal of Diagnostic Reliability (QAREL) checklist [
      • Lucas N.P.
      • Macaskill P.
      • Irwig I.
      • Bogduk N.
      The development of a quality appraisal tool for studies of diagnostic reliability.
      ]. These items cover the representativeness of participants, time interval between assessments, correct application of assessment, and appropriate statistical analysis. The other items were not applicable in this review: whether participants were blinded to their initial assessment, to other participants' assessments, to any reference standard, or to clinical information, or blinded to additional cues that were not part of the test. A simple summation of the five included items was calculated (0 = weak, 5 = strong). Given the heterogeneity of the studies in this review, a narrative synthesis was carried out.

      2.3 Definition of strength of association and agreement

      Association according to Pearson or Spearman correlation coefficients was classified as weak (below 0.36), moderate (0.36–0.67), strong (0.68–0.90), and very strong (above 0.90) [
      • Taylor R.
      Interpretation of the correlation coefficient: a basic review.
      ].
      Agreement according to intraclass correlation coefficients was classified as weak (below 0.36), moderate (0.36–0.67), strong (0.68–0.90), and very strong (above 0.90). Agreement according to kappa scores were classified as: slight (<0.20), fair (0.20–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and almost perfect (0.81–1.0) [
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      ].

      3. Results

      3.1 Search findings

      Two hundred seventy-five articles were identified on Medline, 350 on Embase, 102 on PsycINFO, 18 on Global Health, and 2 on Global Management Information (all accessed 22 April, 2015). Having removed duplicates, 450 abstracts were reviewed of which four comparing retrospective and contemporary PROMs, and five comparing retrospective PROMs and population norms, met the inclusion criteria. Most of the studies were excluded either because they did not capture a contemporary baseline PROM measurement or there was no statistical assessment of the strength of association or agreement between contemporary and retrospective PROMs. A citation search on PubMed (forward and backward snowballing) identified 2 additional studies comparing retrospective and contemporary PROMs and six comparing retrospective PROMs and population norms (Fig. 1). All studies comparing retrospective and contemporary PROMs were methodologically strong according to the QAREL checklist.

      3.2 Comparison of retrospective with contemporary PROMs

      Of the six studies, one was from the United Kingdom [
      • Emberton M.
      • Challands A.
      • Styles R.A.
      • Wightman J.A.
      • Black N.
      Recollected versus contemporary patient reports of pre-operative symptoms in men undergoing transurethral prostatic resection for benign disease.
      ], one was multinational [
      • Lingard E.A.
      • Wright E.A.
      • Sledge C.B.
      Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.
      ], three were from Canada [
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
      ,
      • Howell J.
      • Xu M.
      • Duncan C.P.
      • Masri B.A.
      • Garbuz D.S.
      A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.
      ,
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.
      ], and one from the United States [
      • Helfand B.T.
      • Fought A.
      • Manvar A.M.
      • McVary K.T.
      Determining the utility of recalled lower urinary tract symptoms.
      ] (Table 3). The studies involved 75–177 patients, with one exception with 770 patients [
      • Lingard E.A.
      • Wright E.A.
      • Sledge C.B.
      Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.
      ]. Four involved patients with hip and knee problems and two were based on urological patients. Several reported on the level of agreement between retrospective and contemporary reports for more than one PROM.
      Table 3Studies comparing retrospective to contemporary PROMs
      Author

      Country/Year
      Condition/procedure

      Recall period

      Sample size
      PROMsLevel of association (correlation coefficient)Level of agreementRetrospective health compared to contemporary report
      Mean difference or proportions different; P values.
      Emberton
      • Emberton M.
      • Challands A.
      • Styles R.A.
      • Wightman J.A.
      • Black N.
      Recollected versus contemporary patient reports of pre-operative symptoms in men undergoing transurethral prostatic resection for benign disease.


      UK 1995
      Benign prostatic hyperplasia

      3 mo

      n = 75
      AUA Symptom Index

      AUA Symptom Impact Index
      Pearson

      Symptom Index: 0.6

      Symptom Impact Index: 0.6
      Weighted kappa

      Symptom Index: 0.3

      Symptom Impact Index: 0.3
      No difference
      Lingard
      • Lingard E.A.
      • Wright E.A.
      • Sledge C.B.
      Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.


      USA, UK, and Australia 2001
      Total knee arthroplasty

      3 mo

      n = 770
      Western Ontario & McMaster Osteoarthritis Index (WOMAC) pain scale

      SF-36 function scale
      Spearman

      WOMAC (pain scale): 0.53

      SF-36 (function scale): 0.48
      Weighted kappa

      Individual items: 0.20–0.41
      Worse for WOMAC pain scale (51.9% no difference, 31.3% recalled more pain, 16.8% recalled less pain) (P < 0.001)

      No consistent difference for SF-36 function scale (75% no difference, 11.8% recalled less limitation, 3.5% recalled more limitation) Patients recalled significantly less limitation for walking >1 mile (P < 0.001) but significantly more limitation for walking 100 yards (P = 0.009).
      Bryant
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.


      Canada 2006
      Knee surgery

      2 wk

      n = 177
      SF-36

      International Knee Documentation Committee (IKDC) Subjective Form

      Anterior Cruciate Ligament–Quality of Life (ACL-QOL)

      Western Ontario Meniscal Evaluation Tool (WOMET)

      Knee Injury and Osteoarthritis Outcome Score (KOOS)
      Pearson

      SF-36 (PCS): 0.81

      SF-36 (MCS): 0.68

      IKDC: 0.92

      ACL-QOL: 0.86

      WOMET: 0.88

      KOOS: 0.93
      Intraclass coefficient

      SF-36 (PCS): 0.81

      SF-36 (MCS): 0.67

      IKDC: 0.92

      ACL-QOL: 0.86

      WOMET: 0.88

      KOOS: 0.93
      No difference
      Howell
      • Howell J.
      • Xu M.
      • Duncan C.P.
      • Masri B.A.
      • Garbuz D.S.
      A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.


      Canada 2008
      Total hip arthroplasty

      3 days; 6 and 12 wk

      n = 104
      WOMAC

      OHS

      SF-12 (PCS)

      SF-12 (MCS)
      Spearman

      3 days; 6 wk; 12 wk

      WOMAC: 0.80, 0.78, 0.86

      OHS: 0.82, 0.80, 0.92

      SF-12 (PCS): 0.66, 0.54, 0.76

      SF-12 (MCS): 0.77, 0.71, 0.76
      Intraclass correlation

      3 days; 6 wk; 12 wk

      WOMAC: 0.86, 0.88, 0.93

      OHS: 0.91, 0.88, 0.96

      SF-12 (PCS): 0.83, 0.77, 0.90

      SF-12 (MCS): 0.86, 0.84, 0.93
      3 days: worse (OHSΔ = 1.58 P = 0.01, WOMAC Δ = −2.21 P = 0.029, SF-12 MCS Δ = −4.82 P < 0.001)

      6 wk: worse (SF-12 MCSΔ = −2.79 P = 0.01)

      12 wk: no difference
      Marsh
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.


      Canada 2009
      Total hip arthroplasty

      6 wk

      n = 174
      WOMAC

      OHS

      SF-12 (PCS)

      SF-12 (MCS)

      Lower Extremity Functional Scale (LEFS)

      Feeling thermometer
      Pearson

      WOMAC: 0.89

      OHS: 0.87

      SF-12 (PCS): 0.62

      SF-12 (MCS): 0.48

      LEFS: 0.86

      Feeling thermometer: 0.63
      Intraclass correlation

      WOMAC: 0.88

      OHS: 0.87

      SF-12 (PCS): 0.58

      SF-12 (MCS): 0.48

      LEFS: 0.86

      Feeling thermometer: 0.60
      Better (SF-12 PCS Δ = 2.83, P < 0.01)

      No difference (OHS Δ = −0.04, P = 0.96; SF-12 MCS Δ = 2.04, P = 0.10)

      Worse (WOMAC Δ = 2.74, P = 0.01; feeling thermometer Δ = −5.06, P < 0.01)
      Helfand
      • Helfand B.T.
      • Fought A.
      • Manvar A.M.
      • McVary K.T.
      Determining the utility of recalled lower urinary tract symptoms.


      USA 2010
      Urological conditions

      6 mo

      n = 98
      AUA Symptom Index (SI)

      Quality of Life (QOL) scores
      Pearson

      AUA SI: 0.73

      QOL: 0.73
      Kappa

      AUA SI: 0.56

      QOL: 0.56
      Better: AUA SI (recalled mean score 12.2, contemporary 13.1)

      No difference: QOL (recalled mean score 2.6, contemporary 2.6)
      Abbreviations: PROMs, patient-reported outcome measures; SF, Short Form.
      a Mean difference or proportions different; P values.
      Eleven different PROMs were used including the SF-36 or SF-12 (four studies), the WOMAC (three studies), the American Urological Association Symptom Index (two studies), the Western Ontario Meniscal Evaluation Tool, the Knee Injury and Osteoarthritis Outcome Score, Oxford Hip Score, Lower Extremity Functional Scale, and the feeling thermometer. The time period for retrospective reporting was predominantly 2 weeks to 3 months although one study reported 3 days (in addition to longer periods) and one used 6 months.
      All six studies assessed the level of association between retrospective and contemporary PROMs scores using correlation coefficients (four used Pearson and two used Spearman coefficients), all reported on the level of agreement (three used kappa statistics and three used intraclass coefficients). Most presented analyses of the full index scores though some reported on subscales. A total of 30 correlations coefficients of full-scale or subscale scores were reported, of which nine were moderate, 18 were strong, and three were very strong.
      Three studies that each used several PROMs at different time points thus generating 24 comparisons, the level of agreement for continuous data (intraclass correlations) was very strong for eight, strong for 12 and four were moderate [
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
      ,
      • Howell J.
      • Xu M.
      • Duncan C.P.
      • Masri B.A.
      • Garbuz D.S.
      A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.
      ,
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.
      ]. In contrast, for PROMs that were converted to categorical variables for analysis, kappa statistics revealed only fair to moderate levels of agreement [
      • Emberton M.
      • Challands A.
      • Styles R.A.
      • Wightman J.A.
      • Black N.
      Recollected versus contemporary patient reports of pre-operative symptoms in men undergoing transurethral prostatic resection for benign disease.
      ,
      • Lingard E.A.
      • Wright E.A.
      • Sledge C.B.
      Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.
      ,
      • Helfand B.T.
      • Fought A.
      • Manvar A.M.
      • McVary K.T.
      Determining the utility of recalled lower urinary tract symptoms.
      ].
      Correlations tended to be stronger, the shorter the time interval; 1 month or less [
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
      ,
      • Howell J.
      • Xu M.
      • Duncan C.P.
      • Masri B.A.
      • Garbuz D.S.
      A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.
      ] reported strong or very strong agreement. Intervals of 3 months or more resulted in only moderate agreement [
      • Emberton M.
      • Challands A.
      • Styles R.A.
      • Wightman J.A.
      • Black N.
      Recollected versus contemporary patient reports of pre-operative symptoms in men undergoing transurethral prostatic resection for benign disease.
      ,
      • Lingard E.A.
      • Wright E.A.
      • Sledge C.B.
      Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.
      ]. Another factor associated with the strength of agreement was the type of patient. Most studies that had strong agreement were based on orthopedic patients suggesting patient characteristics or the type of intervention (e.g., elective surgery rather than medical treatment) may influence the relationship.
      There was no consistency in the direction of any difference between retrospective and contemporary accounts. One study found that patients tend to recall better baseline health than what they reported in their contemporary PROMs [
      • Helfand B.T.
      • Fought A.
      • Manvar A.M.
      • McVary K.T.
      Determining the utility of recalled lower urinary tract symptoms.
      ], two studies reported the opposite [
      • Lingard E.A.
      • Wright E.A.
      • Sledge C.B.
      Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.
      ,
      • Howell J.
      • Xu M.
      • Duncan C.P.
      • Masri B.A.
      • Garbuz D.S.
      A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.
      ], one found it varied by PROM [
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.
      ], and two found no difference [
      • Emberton M.
      • Challands A.
      • Styles R.A.
      • Wightman J.A.
      • Black N.
      Recollected versus contemporary patient reports of pre-operative symptoms in men undergoing transurethral prostatic resection for benign disease.
      ,
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
      ].
      The strength of agreement may be limited if the test–retest reliability of the PROM is poor. In Table 4, the reliability estimates for all the measures that were included in studies in Table 3 are presented. Test–retest reliability for all the PROMs used was excellent and higher than the agreements captured when comparing retrospective to contemporary PROMs. This suggests there are additional reasons that influence recall when retrospective PROMs are used.
      Table 4Test–retest reliability of PROMs included in literature review
      PROMTest–retest reliability
      SF-12Physical component: ICC 0.83
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.


      Mental component 0.91
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.
      SF-36ICC = 0.43–0.90
      • Ware J.E.J.
      • Snow K.K.
      • Kosinski M.
      • Gandek B.
      SF-36 Health Survey: manual and interpretation guide.
      Oxford Hip ScoreBland Altman coefficient 7.27
      • Dawson J.
      • Fitzpatrick R.
      • Carr A.
      • Murray D.
      Questionnaire on the perceptions of patients about total hip replacement.
      WOMACICC > 0.7
      • Stucki G.
      • Sangha O.
      • Stucki S.
      • Michel B.A.
      • Tyndall A.
      • Dick W.
      • et al.
      Comparison of the WOMAC (Western Ontario and McMaster Universities) osteoarthritis index and a self-report format of the self-administered Lequesne-Algofunctional index in patients with knee and hip osteoarthritis.
      Lower Extremity Functional ScaleICC = 0.93

      Smith S, Cano S, Lamping D, Staniszewska S, Browne J, Lewsey J, et al. Patient-Reported Outcome Measures (PROMs) for routine use in Treatment Centres: recommendations based on a review of the scientific evidence. Report to the Department of Health, 2005. Available at https://www.lshtm.ac.uk/php/departmentofhealthservicesresearchandpolicy/assets/promsnickblack2005.pdf

      Feeling thermometerICC 0.94
      • Marsh J.
      • Bryant D.
      • MacDonald S.J.
      Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.
      AUA Symptom Indexr = 0.92
      • Barry M.J.
      • Fowler Jr., F.J.
      • O'Leary M.P.
      • Bruskewitz R.C.
      • Holtgrewe H.L.
      • Mebust W.K.
      • et al.
      The American Urological Association symptom index for benign prostatic hyperplasia.
      IKDC subjective formICC = 0.85–0.99
      • Kanakamedala A.C.
      • Anderson A.F.
      • Irrgang J.J.
      IKDC Subjective Knee Form and Marx Activity Rating Scale are suitable to evaluate all orthopaedic sports medicine knee conditions: a systematic review.
      ACL-QOLStandard error of measurement is 6%
      • Mohtadi N.
      Development and validation of the quality of life outcome measure (questionnaire) for chronic anterior cruciate ligament deficiency.
      WOMETICC = 0.79
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
      KOOSICC = 0.75–0.93
      • Bryant D.
      • Norman G.
      • Stratford P.
      • Marx R.G.
      • Walter S.D.
      • Guyatt G.
      • et al.
      Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
      Abbreviations: PROMs, patient-reported outcome measures; SF, Short Form; WOMAC, Western Ontario and McMaster Osteoarthritis Index; AUA, American Urological Association; IKDC, International Knee Documentation Committee; ACL-QOL, Anterior Cruciate Ligament–Quality of Life; WOMET, Western Ontario Meniscal Evaluation Tool; KOOS, Knee Injury and Osteoarthritis Outcome Score; ICC, intraclass correlation coefficient.

      3.3 Comparison of retrospective PROMs with population norms

      There were 11 studies (Table 5), four from North America [
      • Mock C.
      • MacKenzie E.
      • Jurkovich G.
      • Burgess A.
      • Cushing B.
      • deLateur B.
      • et al.
      Determinants of disability after lower extremity fracture.
      ,
      • Michaels J.
      • Madey S.M.
      • Krieg J.C.
      • Long W.B.
      Traditional injury scoring underestimates the relative consequences of orthopedic injury.
      ,
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ,
      • Lange R.T.
      • Iverson G.L.
      • Rose A.
      Post-concussion symptom reporting and the “good-old-days” bias following mild traumatic brain injury.
      ], four from Australia or New Zealand [
      • Gabbe B.J.
      • Cameron P.A.
      • Graves S.E.
      • Williamson O.D.
      • Edwards E.R.
      Preinjury status: are orthopaedic trauma patients different than the general population?.
      ,
      • Watson W.L.
      • Ozanne-Smith J.
      • Richardson J.
      Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses.
      ,
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ,
      • Basso O.
      • Olsen J.
      • Bisanti L.K.W.
      The performance of several indicators in detecting recall bias.
      ], and three from Europe [
      • Tidermark J.
      • Zethraeus N.
      • Svensson O.
      • Törnkvist H.
      • Ponzer S.
      Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.
      ,
      • Lyons R.
      • Kendrick D.
      • Towner E.M.
      • Christie N.
      • Macey S.
      • Coupland C.
      • et al.
      Measuring the population burden of injuries—implications for global and national estimates: a multi-centre prospective UK longitudinal study.
      ,
      • Tøien K.
      • Bredal I.S.
      • Skogstad L.
      • Myhren H.
      • Ekeberg O.
      Health related quality of life in trauma patients. Data from a one-year follow up study compared with the general population.
      ]. Eight studies involved fewer than 500 patients (86–472) but three were larger (1,500–3,000 patients). All the studies involved trauma patients apart from one on patients with acute lung injury [
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ]. Most studies included adults of all ages. The two exceptions were a study of elderly people who had experienced a fractured neck of femur [
      • Tidermark J.
      • Zethraeus N.
      • Svensson O.
      • Törnkvist H.
      • Ponzer S.
      Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.
      ] and a study of young adult trauma victims [
      • Ameratunga S.N.
      • Norton R.N.
      • Connor J.L.
      • Robinson E.
      • Civil I.
      • Coverdale J.
      • et al.
      A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.
      ].
      Table 5Studies comparing retrospective PROMs with age–sex standardized general population norms
      Author

      Country/Year
      Condition/procedure

      Recall period

      Number of patients

      Patient age and sex
      PROMsRetrospective health compared to general population
      Mean difference; P value.
      Mock
      • Mock C.
      • MacKenzie E.
      • Jurkovich G.
      • Burgess A.
      • Cushing B.
      • deLateur B.
      • et al.
      Determinants of disability after lower extremity fracture.


      USA 2000
      Leg injury

      Weeks (hospital discharge)

      n = 302

      Adults (18–64 yrs)
      Sickness Impact Profile (SIP)No difference
      Michaels
      • Michaels J.
      • Madey S.M.
      • Krieg J.C.
      • Long W.B.
      Traditional injury scoring underestimates the relative consequences of orthopedic injury.


      USA 2001
      Trauma (blunt force)

      Days (early in hospital stay)

      n = 165

      Adults (mean age 37 yrs); 67% male
      SF-36

      SIP
      No difference
      Tidermark
      • Tidermark J.
      • Zethraeus N.
      • Svensson O.
      • Törnkvist H.
      • Ponzer S.
      Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.


      Sweden 2002
      Fractured neck of femur

      12–48 hr after admission

      n = 90

      Elderly (mean age 80 yrs)
      EQ-5DNo difference
      Ameratunga
      • Ameratunga S.N.
      • Norton R.N.
      • Connor J.L.
      • Robinson E.
      • Civil I.
      • Coverdale J.
      • et al.
      A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.


      New Zealand 2006
      Trauma from motor vehicle accident
      Also compared with representative sample of drivers.


      One day

      n = 472

      Young adults (70% 15–44 yrs); 63% male
      SF-36Better than general population

      No difference from representative sample of drivers
      Gabbe
      • Gabbe B.J.
      • Cameron P.A.
      • Graves S.E.
      • Williamson O.D.
      • Edwards E.R.
      Preinjury status: are orthopaedic trauma patients different than the general population?.


      Australia 2007
      Trauma (mixed)

      Median 6 days (IQR 3–12 days)

      n = 2,388

      Adults
      SF-12Better: SF-12 (physical) mean 50.9 vs. 48.9 (P < 0.001)

      SF-12 (mental) mean 54.5 vs. 52.4 (P < 0.001)

      Differences confined to men and under 55 yrs.
      Watson
      • Watson W.L.
      • Ozanne-Smith J.
      • Richardson J.
      Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses.


      Australia 2007
      Trauma (mixed)

      4 days (median)

      n = 186

      Adults (18–74 yrs)
      SF-6D

      SF-36

      Assessment of Quality of Life
      Better: Assessment of Quality of Life population norm mean utility 0.83, recalled 0.95

      SF-6D population norm mean utility 0.78, recalled 0.92

      Better for all age groups (P < 0.05).
      Gifford
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.


      USA 2010
      Acute lung injury

      Days–weeks (as soon as patient regained capacity)

      n = 136

      Adults (median age 49 yrs; IQR 40–60)
      SF-36Worse: mean paired difference for all SF-36 domains (mean paired differences ranged from 2.6–17.9)

      Mean paired difference was significantly better in population norm for all SF-36 domains (P < 0.01) except for vitality (P = 0.12)

      Mean retrospective domain scores ranged 56.4–75.6, mean population norm domains scores ranged 58.9–87.6
      Lange
      • Lange R.T.
      • Iverson G.L.
      • Rose A.
      Post-concussion symptom reporting and the “good-old-days” bias following mild traumatic brain injury.


      Canada 2010
      Mild traumatic brain injury
      Compared with 177 community controls.


      Median 1.8 months (0.2–8.0)

      n = 86

      Adults (mean age 37 yrs; SD 13.7)
      British Columbia Post-Concussion Symptom InventoryBetter: overall score (P < 0.01) and in 6 of the 13 individual items (P < 0.05)
      Lyons
      • Lyons R.
      • Kendrick D.
      • Towner E.M.
      • Christie N.
      • Macey S.
      • Coupland C.
      • et al.
      Measuring the population burden of injuries—implications for global and national estimates: a multi-centre prospective UK longitudinal study.


      UK 2011
      Trauma (mixed)

      Within 7 days

      n = 1,517

      Adults (median age 37 yrs; IQR 21–61)
      EQ-5DBetter: mean score 3.3% (95% CI 1.9–4.7%) higher
      Toien
      • Tøien K.
      • Bredal I.S.
      • Skogstad L.
      • Myhren H.
      • Ekeberg O.
      Health related quality of life in trauma patients. Data from a one-year follow up study compared with the general population.


      Norway 2011
      Trauma (mixed)

      17 days (non-ICU) and 44 days (ICU patients)

      n = 242

      Adults (mean age 42 yrs)
      SF-36Better: mean score higher (P < 0.001)
      Wilson
      • Wilson R.
      • Derrett S.
      • Hansen P.
      • Langley J.
      Retrospective evaluation versus population norms for the measurement of baseline health status.


      New Zealand 2012
      Trauma (mixed)

      3 mo

      n = 2,856

      Adults (18–64 yrs)
      EQ-5DBetter: Both the recovered and not recovered groups had significantly better recalled than the population norm

      Recovered at 5 months: retrospective mean (SD) 0.98 (0.97–0.99) vs. norm 0.85 (0.84–0.86)

      Not recovered at 5 months: retrospective mean (SD) 0.93 (0.92–0.94) vs. norms 0.85 (0.84–0.87)

      Recovered at 12 months: retrospective mean (SD) 0.96 (0.96–0.97) vs. norms 0.86 (0.85–0.87)

      Not recovered at 12 months: retrospective mean (SD) 0.93 (0.93–0.94) vs. norms 0.85 (0.83–0.86)
      Abbreviations: PROMs, patient-reported outcome measures; SF, Short Form; EQ-5D, EuroQuol-5D; IQR, interquartile range; SD, standard deviation; CI, confidence interval; ICU, intensive care unit.
      a Mean difference; P value.
      b Also compared with representative sample of drivers.
      c Compared with 177 community controls.
      All reported on a generic PROM: six used a version of the SF (SF-36, SF-12, SF-6); three used the EQ-5D; and two used the Sickness Impact Profile. The time period for retrospective reporting in six studies was less than 1 week [
      • Michaels J.
      • Madey S.M.
      • Krieg J.C.
      • Long W.B.
      Traditional injury scoring underestimates the relative consequences of orthopedic injury.
      ,
      • Tidermark J.
      • Zethraeus N.
      • Svensson O.
      • Törnkvist H.
      • Ponzer S.
      Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.
      ,
      • Ameratunga S.N.
      • Norton R.N.
      • Connor J.L.
      • Robinson E.
      • Civil I.
      • Coverdale J.
      • et al.
      A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.
      ,
      • Gabbe B.J.
      • Cameron P.A.
      • Graves S.E.
      • Williamson O.D.
      • Edwards E.R.
      Preinjury status: are orthopaedic trauma patients different than the general population?.
      ,
      • Watson W.L.
      • Ozanne-Smith J.
      • Richardson J.
      Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses.
      ,
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ,
      • Lyons R.
      • Kendrick D.
      • Towner E.M.
      • Christie N.
      • Macey S.
      • Coupland C.
      • et al.
      Measuring the population burden of injuries—implications for global and national estimates: a multi-centre prospective UK longitudinal study.
      ]. In the other studies, it extended from a few weeks to 3 months.
      All but one study used population norms derived from statutory surveys of the general population. The exception used a matched comparison group drawn from the local community [
      • Lange R.T.
      • Iverson G.L.
      • Rose A.
      Post-concussion symptom reporting and the “good-old-days” bias following mild traumatic brain injury.
      ]. Also, one study of drivers who had suffered trauma in road accidents was compared not only with population norms but also with a sample of uninjured drivers [
      • Ameratunga S.N.
      • Norton R.N.
      • Connor J.L.
      • Robinson E.
      • Civil I.
      • Coverdale J.
      • et al.
      A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.
      ].
      Of the 10 studies that used general population norms, six found that patients recalled their health as having been better than the general population [
      • Ameratunga S.N.
      • Norton R.N.
      • Connor J.L.
      • Robinson E.
      • Civil I.
      • Coverdale J.
      • et al.
      A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.
      ,
      • Gabbe B.J.
      • Cameron P.A.
      • Graves S.E.
      • Williamson O.D.
      • Edwards E.R.
      Preinjury status: are orthopaedic trauma patients different than the general population?.
      ,
      • Watson W.L.
      • Ozanne-Smith J.
      • Richardson J.
      Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses.
      ,
      • Lyons R.
      • Kendrick D.
      • Towner E.M.
      • Christie N.
      • Macey S.
      • Coupland C.
      • et al.
      Measuring the population burden of injuries—implications for global and national estimates: a multi-centre prospective UK longitudinal study.
      ,
      • Tøien K.
      • Bredal I.S.
      • Skogstad L.
      • Myhren H.
      • Ekeberg O.
      Health related quality of life in trauma patients. Data from a one-year follow up study compared with the general population.
      ,
      • Wilson R.
      • Derrett S.
      • Hansen P.
      • Langley J.
      Retrospective evaluation versus population norms for the measurement of baseline health status.
      ]. In the four other studies, three found no difference [
      • Mock C.
      • MacKenzie E.
      • Jurkovich G.
      • Burgess A.
      • Cushing B.
      • deLateur B.
      • et al.
      Determinants of disability after lower extremity fracture.
      ,
      • Michaels J.
      • Madey S.M.
      • Krieg J.C.
      • Long W.B.
      Traditional injury scoring underestimates the relative consequences of orthopedic injury.
      ,
      • Tidermark J.
      • Zethraeus N.
      • Svensson O.
      • Törnkvist H.
      • Ponzer S.
      Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.
      ] and in only one did patients report worse health than the general population [
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ]. The latter was the only study not focused on trauma patients but on those who had developed acute lung injury who were likely to have been in a poor state of health before being hospitalized. The two studies that compared patients with matched samples rather than the general population reported either no difference [
      • Ameratunga S.N.
      • Norton R.N.
      • Connor J.L.
      • Robinson E.
      • Civil I.
      • Coverdale J.
      • et al.
      A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.
      ] or better recalled health [
      • Lange R.T.
      • Iverson G.L.
      • Rose A.
      Post-concussion symptom reporting and the “good-old-days” bias following mild traumatic brain injury.
      ].

      4. Discussion

      4.1 Comparison of retrospective and contemporary PROMs

      Only six studies have compared retrospective and contemporary PROMs. Although the majority of the comparisons (21 of 30) revealed a strong or very strong association (correlation coefficients of over 0.68), the rest were moderate. Levels of agreement for continuous measures were more consistent with 20 of 24 comparisons being strong or very strong. In contrast, comparisons of categorical measures showed only fair to moderate agreement. Stronger associations were observed for indices (than for individual items), for shorter time periods (1 month or less), and for elective surgery patients than for those with medical conditions or treatments. The direction of differences between retrospective and contemporary PROMs also showed no consistent pattern and appeared to be dependent partly on the PROM being used.
      Retrospective PROMs may be influenced for three reasons: recall bias; response shift; and lack of validity of the PROM. Recall bias arises because details may go unnoticed and never be stored; new information may be added to stored memories altering the details; and over time, events may be systematically distorted [
      • Schmier J.K.
      • Halpern M.T.
      Patient recall and recall bias of health state and health status.
      ]. Recall is influenced by the time interval between the event and the time of its assessment: the longer the interval, the higher the probability of recall bias [
      • Skowronski J.J.
      • Betz A.L.
      • Thompson C.P.
      • Shannon L.
      Social memory in every day life: recall of self-events and other-events.
      ]: 20% of details of an event have been found to be irretrievable after 1 year and 50% are irretrievable after 5 years [
      • Bradburn N.
      • Rips L.
      • Shevell S.
      Answering autobiographical questions: the impact of memory and inference on surveys.
      ].
      Response shift refers to the change in perception that can occur when circumstances change [
      • Sprangers M.
      Response-shift bias: a challenge to the assessment of patients' quality of life in cancer clinical trials.
      ,
      • Visser M.
      • Oort F.
      • Sprangers M.
      Methods to detect response shift in quality of life data: a convergent validity study.
      ]. For example, a patient's perception of the severity of a disability or their quality of life may change following treatment. This tends to diminish the assessment of pretreatment severity and thus underestimate the benefits of the treatment. An example of this is when the term “severe,” has a different meaning for the same person in one occasion compared with a previous occasion due to new experiences. This is known as recalibration. Moreover, subjective values may also change over time so that physical, social and psychological aspects of HRQL may be prioritized differently after certain experiences, known as reprioritization. Patients may also redefine the construct in question and attribute new meanings to it, known as scale reconceptualization [
      • Howard J.S.
      • Mattacola C.G.
      • Howell D.M.
      • Lattermann C.
      Response shift theory: an application for health-related quality of life in rehabilitation research and practice.
      ].
      It is possible that the validity of PROMs will be jeopardized when determining retrospective health if the recall interval is lengthy. Most PROMs have been validated for the recall of a person's health over the recent past (between 1 day to past 4 weeks). Indeed, many PROMs are based on patients' reports of their health over the preceding few weeks. However, if patients are required to recall their health for longer periods, the validity of the instrument cannot be assumed.
      For comparisons of healthcare providers or over time, recall bias and response shift will only matter if there is a systematic difference in behavior between groups of patients being compared (e.g., patients attending different hospitals). There is no evidence that such differences exist within countries though some differences have been demonstrated between countries [
      • Black N.A.
      • Glickman M.E.
      • Ding J.
      • Flood A.B.
      International variation in intervention rates: what are the implications for patient selection?.
      ].

      4.2 Comparisons of retrospective PROMs and population norms

      The studies comparing retrospective PROMs with population norms were inevitably limited to generic instruments because disease-specific PROMs are rarely collected in general population surveys and hence limits the availability of population data to generic PROMs. The generalizability of the findings is further limited by the focus of all but one study on trauma victims. The finding that most studies observed that trauma patients recalled their preinjury health as better than average may reflect that patients (mostly car drivers) are fitter and healthier than the general population [
      • Dawson J.
      • Fitzpatrick R.
      • Carr A.
      • Murray D.
      Questionnaire on the perceptions of patients about total hip replacement.
      ]. Although response shift may have contributed, the likelihood that trauma patients were healthier is supported by evidence that rates of sports injuries and gunshot wounds are higher in fitter members of the population [
      • Gabbe B.J.
      • Cameron P.A.
      • Graves S.E.
      • Williamson O.D.
      • Edwards E.R.
      Preinjury status: are orthopaedic trauma patients different than the general population?.
      ,
      • Watson W.L.
      • Ozanne-Smith J.
      • Richardson J.
      Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses.
      ,
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ,
      • Lyons R.
      • Kendrick D.
      • Towner E.M.
      • Christie N.
      • Macey S.
      • Coupland C.
      • et al.
      Measuring the population burden of injuries—implications for global and national estimates: a multi-centre prospective UK longitudinal study.
      ]. This difference is further exaggerated as national population norms are derived from household surveys that include institutionalized individuals. In contrast, the one study of elderly people experiencing a stress fracture related to poor bone density found no difference from the general population (age–sex standardized) [
      • Tidermark J.
      • Zethraeus N.
      • Svensson O.
      • Törnkvist H.
      • Ponzer S.
      Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.
      ]. This is also consistent with the one study in which patients recalled worse health than the general population which focused on acute lung injury [
      • Gifford J.M.
      • Husain N.
      • Dinglas V.D.
      • Colantuoni E.
      • Needham D.M.
      Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
      ].
      There may be a case for the purposes of estimating pre-event health status that estimates could be adjusted for the presence of long-term conditions to reduce overestimation. The findings also suggest the potential of underestimating the prior health of patients if population norms are used directly as surrogates in cases where the patient population involved are younger adults. However, this underestimation may be small and may mostly affect studies in this specific cohort of patients.

      4.3 Limitations

      There are several limitations to consider. First, only one author (E.K.) carried out the search, paper selection, and quality appraisal. Although uncertainties were discussed and resolved with the other author, the reliability of the review would have been enhanced by double-reviewing. Second, comparisons of retrospective and contemporary PROMs that have been studied are dominated by orthopedic surgery (four of six studies) and by studies in North America (four of six). Thus, the generalizability of the findings must be treated with caution. Third, many of the studies that investigated retrospective recall were too small to perform subgroup analysis to take into account of clinical characteristics such as severity of illness. Finally, the generalizability of the comparisons of retrospective PROMs and population norms are even more limited with 10 of the 11 studies focused on trauma patients. In addition, only generic PROMs were considered, but this is understandable given that population norms are not available for disease-specific PROMs.

      4.4 Implications for policy and research

      Making judgments as to which of contemporary and retrospective reports are the more valid is unclear. Contemporary reports are usually considered the “gold standard” so if retrospective reports differ, it is the latter that are judged to be “unreliable.” However, in the context of PROMs, from a patient's point of view, the way they recall their previous health may be of greater relevance to them and to assessing the quality of health care than how patients actually assessed it at the time. In this situation, the retrospective report could be viewed as the “gold standard.” Rather than attach different values to the two types of PROM (in other words, judging whether contemporaneous collection is more or less valid than recalled collection), it is best just to consider the extent to which they differ and the implications both for the use of PROMs in clinical management and in provider comparisons. As long as data are collected in the same way in different providers, then comparisons will not be undermined.
      Our knowledge of the use of retrospective PROMs in the United Kingdom is extremely limited: the relevance of findings in other countries is uncertain given the potential influence of culture and other contextual factors; existing studies of unexpected emergency admissions are limited largely to trauma care; and there have been no published attempts to study both of the issues addressed in this review in a combined study (i.e., retrospective vs. contemporary vs. population norms). Until further research has been conducted, the best policy for using PROMs in emergency admissions will remain uncertain.
      The key methodological challenges that require further research are as follows: detailed investigation of the relationship between retrospective and contemporary PROMs (inevitably in elective conditions) which should also explore the influence of patient characteristics and of methodological factors on the relationship; determination of the potential use of population norms as a low-cost alternative to retrospective PROMs; and testing the feasibility of retrospective PROMs and population norms in a variety of unexpected emergency hospital admissions.

      References

        • Black N.
        Patient reported outcome measures could help transform healthcare.
        BMJ. 2013; 346: f167
      1. Health & Social Care Information Centre. Hospital Episode Statistics 2013-14. Available at http://www.hscic.gov.uk/hes. Accessed October 7, 2015.

        • Hennekens C.H.
        • Buring J.E.
        Epidemiology in medicine.
        Lippincott Williams & Wilkins, Philadelphia1987
        • McKenzie E.
        Measuring disability and quality of life postinjury.
        in: Rivara F.P. Cummings P. Koepsell P.D. Grossman D.C. Maier R.V. Injury control: a guide to research and program evaluation. Cambridge University Press, Cambridge, UK2001
        • Stull D.E.
        • Leidy N.K.
        • Parasuraman B.
        • Chassany O.
        Optimal recall periods for patient-reported outcomes: challenges and potential solutions.
        Curr Med Res Opin. 2009; 25: 929-942
        • Schmier J.K.
        • Halpern M.T.
        Patient recall and recall bias of health state and health status.
        Expert Rev Pharmacoecon Outcomes Res. 2004; 4: 159-163
        • Damiano A.M.
        • Pastores G.M.
        • Ware Jr., J.E.
        The health-related quality of life of adults with Gaucher's disease receiving enzyme replacement therapy: results from a retrospective study.
        Qual Life Res. 1998; 7: 373-386
        • Guyatt G.H.
        • Norman G.R.
        • Juniper E.F.
        • Griffith L.E.
        A critical look at transition ratings.
        J Clin Epidemiol. 2002; 55: 900-908
        • Lucas N.P.
        • Macaskill P.
        • Irwig I.
        • Bogduk N.
        The development of a quality appraisal tool for studies of diagnostic reliability.
        J Clin Epidemiol. 2010; 63: 854-861
        • Taylor R.
        Interpretation of the correlation coefficient: a basic review.
        J Diagn Med Sonography. 1990; 6: 35-39
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174
        • Emberton M.
        • Challands A.
        • Styles R.A.
        • Wightman J.A.
        • Black N.
        Recollected versus contemporary patient reports of pre-operative symptoms in men undergoing transurethral prostatic resection for benign disease.
        J Clin Epidemiol. 1995; 48: 749-756
        • Lingard E.A.
        • Wright E.A.
        • Sledge C.B.
        Pitfalls of using patient recall to derive preoperative status in outcome studies of total knee arthroplasty.
        J Bone Joint Surg Am. 2001; 83-A: 1149-1156
        • Bryant D.
        • Norman G.
        • Stratford P.
        • Marx R.G.
        • Walter S.D.
        • Guyatt G.
        • et al.
        Patients undergoing knee surgery provided accurate ratings of preoperative quality of life and function 2 weeks after surgery.
        J Clin Epidemiol. 2006; 59: 984-993
        • Howell J.
        • Xu M.
        • Duncan C.P.
        • Masri B.A.
        • Garbuz D.S.
        A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.
        J Arthroplasty. 2008; 23: 843-849
        • Marsh J.
        • Bryant D.
        • MacDonald S.J.
        Older patients can accurately recall their preoperative health status six weeks following total hip arthroplasty.
        J Bone Joint Surg Am. 2009; 91 ([Internet]): 2827-2837
        • Helfand B.T.
        • Fought A.
        • Manvar A.M.
        • McVary K.T.
        Determining the utility of recalled lower urinary tract symptoms.
        Urology. 2010; 76: 442-447
        • Ware J.E.J.
        • Snow K.K.
        • Kosinski M.
        • Gandek B.
        SF-36 Health Survey: manual and interpretation guide.
        The Health Institute, New England Medical Centre, Boston1993
        • Dawson J.
        • Fitzpatrick R.
        • Carr A.
        • Murray D.
        Questionnaire on the perceptions of patients about total hip replacement.
        J Bone Joint Surg Br. 1996; 78: 185-190
        • Stucki G.
        • Sangha O.
        • Stucki S.
        • Michel B.A.
        • Tyndall A.
        • Dick W.
        • et al.
        Comparison of the WOMAC (Western Ontario and McMaster Universities) osteoarthritis index and a self-report format of the self-administered Lequesne-Algofunctional index in patients with knee and hip osteoarthritis.
        Osteoarthritis Cartilage. 1998; 6: 79-86
      2. Smith S, Cano S, Lamping D, Staniszewska S, Browne J, Lewsey J, et al. Patient-Reported Outcome Measures (PROMs) for routine use in Treatment Centres: recommendations based on a review of the scientific evidence. Report to the Department of Health, 2005. Available at https://www.lshtm.ac.uk/php/departmentofhealthservicesresearchandpolicy/assets/promsnickblack2005.pdf

        • Barry M.J.
        • Fowler Jr., F.J.
        • O'Leary M.P.
        • Bruskewitz R.C.
        • Holtgrewe H.L.
        • Mebust W.K.
        • et al.
        The American Urological Association symptom index for benign prostatic hyperplasia.
        J Urol. 1992; 148: 1549-1557
        • Kanakamedala A.C.
        • Anderson A.F.
        • Irrgang J.J.
        IKDC Subjective Knee Form and Marx Activity Rating Scale are suitable to evaluate all orthopaedic sports medicine knee conditions: a systematic review.
        Joint Disord Orthopaedic Sports Med. 2016; 1https://doi.org/10.1136/jisakos-2015-000014
        • Mohtadi N.
        Development and validation of the quality of life outcome measure (questionnaire) for chronic anterior cruciate ligament deficiency.
        Am J Sports Med. 1998; 26: 350-359
        • Mock C.
        • MacKenzie E.
        • Jurkovich G.
        • Burgess A.
        • Cushing B.
        • deLateur B.
        • et al.
        Determinants of disability after lower extremity fracture.
        J Trauma. 2000; 49: 1002-1011
        • Michaels J.
        • Madey S.M.
        • Krieg J.C.
        • Long W.B.
        Traditional injury scoring underestimates the relative consequences of orthopedic injury.
        J Trauma. 2001; 50: 389-395
        • Tidermark J.
        • Zethraeus N.
        • Svensson O.
        • Törnkvist H.
        • Ponzer S.
        Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol.
        Qual Life Res. 2002; 11: 473-481
        • Ameratunga S.N.
        • Norton R.N.
        • Connor J.L.
        • Robinson E.
        • Civil I.
        • Coverdale J.
        • et al.
        A population-based cohort study of longer-term changes in health of car drivers involved in serious crashes.
        Ann Emerg Med. 2006; 48: 729-736
        • Gabbe B.J.
        • Cameron P.A.
        • Graves S.E.
        • Williamson O.D.
        • Edwards E.R.
        Preinjury status: are orthopaedic trauma patients different than the general population?.
        J Orthop Trauma. 2007; 21: 223-228
        • Watson W.L.
        • Ozanne-Smith J.
        • Richardson J.
        Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses.
        Inj Prev. 2007; 13: 45-50
        • Gifford J.M.
        • Husain N.
        • Dinglas V.D.
        • Colantuoni E.
        • Needham D.M.
        Baseline quality of life before intensive care: a comparison of patient versus proxy responses.
        Crit Care Med. 2010; 38: 855-860
        • Lange R.T.
        • Iverson G.L.
        • Rose A.
        Post-concussion symptom reporting and the “good-old-days” bias following mild traumatic brain injury.
        Arch Clin Neuropsychol. 2010; 25: 442-450
        • Lyons R.
        • Kendrick D.
        • Towner E.M.
        • Christie N.
        • Macey S.
        • Coupland C.
        • et al.
        Measuring the population burden of injuries—implications for global and national estimates: a multi-centre prospective UK longitudinal study.
        PLoS Med. 2011; 8: e1001140
        • Tøien K.
        • Bredal I.S.
        • Skogstad L.
        • Myhren H.
        • Ekeberg O.
        Health related quality of life in trauma patients. Data from a one-year follow up study compared with the general population.
        Scand J Trauma Resusc Emerg Med. 2011; 19: 22
        • Wilson R.
        • Derrett S.
        • Hansen P.
        • Langley J.
        Retrospective evaluation versus population norms for the measurement of baseline health status.
        Health Qual Life Outcomes. 2012; 10: 68
        • Basso O.
        • Olsen J.
        • Bisanti L.K.W.
        The performance of several indicators in detecting recall bias.
        Epidemiology. 1997; 8: 269-274
        • Skowronski J.J.
        • Betz A.L.
        • Thompson C.P.
        • Shannon L.
        Social memory in every day life: recall of self-events and other-events.
        J Pers Soc Psychol. 1991; 60: 831-843
        • Bradburn N.
        • Rips L.
        • Shevell S.
        Answering autobiographical questions: the impact of memory and inference on surveys.
        Science. 1987; 236: 157-161
        • Sprangers M.
        Response-shift bias: a challenge to the assessment of patients' quality of life in cancer clinical trials.
        Cancer Treat Rev. 1996; 22: 55-62
        • Visser M.
        • Oort F.
        • Sprangers M.
        Methods to detect response shift in quality of life data: a convergent validity study.
        Qual Life Res. 2005; 14: 629-639
        • Howard J.S.
        • Mattacola C.G.
        • Howell D.M.
        • Lattermann C.
        Response shift theory: an application for health-related quality of life in rehabilitation research and practice.
        J Allied Health. 2011; 40: 31-38
        • Black N.A.
        • Glickman M.E.
        • Ding J.
        • Flood A.B.
        International variation in intervention rates: what are the implications for patient selection?.
        Int J Tech Ass Health Care. 1995; 11: 719-735