Retrospectively patient reported pre-event health status showed strong association and agreement with contemporaneous reports

Objective The unpredictability of the occurrence of illnesses and injuries leading to most emergency admissions to hospital makes it impossible prospectively to collect pre-admission patient reported outcome measures (PROMs). Our aims were to review the evidence for using retrospective PROMs to determine pre-event health status and the validity of using general population norms instead of retrospective PROMs. Study design and setting Searches of Medline, PsycINFO, Embase, Global Health, and Health Management information. Six studies met the inclusion criteria for the first aim and 11 studies addressed the second aim. Narrative syntheses were conducted.


Introduction
The growing acceptance of the importance of patients' views of their outcome when evaluating interventions and assessing the quality of services means that it is necessary to devise ways in which accurate Patient Reported Outcome Measures (PROMs) can be obtained (referred to as PROs in the USA) [1]. PROMs are self-completed questionnaires where patients are asked to report their own state of health (multi-dimensional symptoms, functional status) and health-related quality of life (HRQL) at one point in time. PROMs can be categorised as generic (e.g. EQ-5D, SF36) or disease-specific (Oxford Hip Score or Western Ontario & McMaster Osteoarthritis Index). Generic PROMs capture broad domains on function or HRQL, can be converted into utility scores, and provide the means to compare between conditions and treatments. Disease-specific PROMs have greater sensitivity by incorporating aspects of function and HRQL specific to that condition [1]. By comparing measurements before and after a health care intervention the outcome of care can be determined.
Emergency admissions make up 34% of hospital admissions in England [2]. They can be categorised as either a largely unexpected acute event, such as an acute myocardial What is new?
• There is a strong association between PROMs collected retrospectively and contemporaneous collection among patients undergoing elective surgery.
• Agreement is also strong for PROMs that are continuous measures but only fair to moderate for categorical measures.
• Retrospectively collected data suggest that young adult trauma patients are healthier than population norms. The reverse may be true for older patients admitted for medical conditions.
• Retrospective collection offers a means of assessing patient reported outcomes in unexpected emergency admissions. However, further research is needed to establish the best policy for their use. ). This takes the place of contemporaneous collection before the event that can be done when considering planned elective treatments such as hip replacements.
Retrospective self-reporting has been extensively used in aetiological case-control studies and in cross-sectional surveys [3] in which respondents are asked to recall characteristics of their health over a specified time frame which may be short (e.g. preceding week) or long (e.g. past year).
Second, and much cheaper than retrospective reporting, is to use age-sex standardised PROMs which have been collected from the general population (or an appropriate comparison group) as part of a cross-sectional survey, as a surrogate measure of a patients' pre-event baseline health [4]. The use of population norms assumes that patients experiencing an emergency admission are typical of the wider population. This assumption could lead to an over or under-estimate of the impact of a health care intervention. If patients are in fact healthier at baseline than the general population, (as might be the case when studying recovery from trauma that occurred while undertaking a dangerous sport such as rock-climbing), using the population norm as a surrogate baseline could lead to an 'overestimate' of the treatment effect. On the other hand if patients were in worse health than their peers beforehand (as might be expected for those suffering a heart attack), an 'underestimation' of the treatment effect will be observed.
Although there has been no review of the strength of association and of agreement between these two approaches in emergency admissions, two systematic reviews have considered other aspects of recall. One considered the length of recall periods for PROMs in clinical M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT 5 trials and concluded that the optimum depended on two broad categories of factors: characteristics of the phenomenon being recalled (such as how recently it had occurred, its attributes, its complexity) and the context of the recalled phenomenon (such as its salience, the patient's mood) together with the nature of the topic [5]. The second review concluded that recall bias is a concern with PROMs and called for more research to understand and identify situations where the use of recall is acceptable [6].
Our aims were to review systematically the scientific evidence on (i) the extent of association and agreement between PROMs collected retrospectively and contemporaneously to determine pre-event health status and HRQL and (ii) the validity of using general population norms for determining the pre-event health status and HRQL of people with an unexpected emergency admission to hospital.  and duplicates removed. The title and abstracts were screened by one author (EK) to assess suitability. Studies in children, adolescents, carer proxies and those with cognitive impairments were excluded. The remaining articles were read and forwards and backwards searching of references was conducted ( Figure 1).

Quality appraisal
For studies comparing retrospective and contemporary PROMs, their methodological quality was appraised by one author (EK) using five relevant items selected from the Quality Appraisal of Diagnostic Reliability (QAREL) Checklist [9]. These items cover the representativeness of participants, time interval between assessments, correct application of assessment, and appropriate statistical analysis. The other items were not applicable in this review: whether participants were blinded to their initial assessment, to other participants' assessments, to any reference standard or to clinical information, or blinded to additional cues that were not part of the test. A simple summation of the five included items was calculated (0 = weak, 5 = strong). Given the heterogeneity of the studies in this review, a narrative synthesis was carried out.

Search findings
identified two additional studies comparing retrospective and contemporary PROMs and six comparing retrospective PROMs and population norms) ( Figure 1). All studies comparing retrospective and contemporary PROMs were methodologically strong according to the QAREL checklist.

Comparison of retrospective with contemporary PROMs
Of the six studies, one was from the UK [12], one was multinational [13], three were from Canada [14][15][16], and one from the USA [17] ( Table 3). The studies involved 75-177 patients, with one exception with 770 patients [13]. Four involved patients with hip and knee problems and two were based on urological patients. Several reported on the level of agreement between retrospective and contemporary reports for more than one PROM. Three studies that each used several PROMs at different time points thus generating 24 comparisons, the level of agreement for continuous data (intra-class correlations) was very strong for eight, strong for 12 and four were moderate. [14,15,16] In contrast, for PROMs that were converted to categorical variables for analysis, Kappa statistics revealed only fair to moderate levels of agreement. [12,13,17] Correlations tended to be stronger, the shorter the time interval; one month or less [14,15] reported strong or very strong agreement. Intervals of three months or more resulted in only moderate agreement. [12,13] Another factor associated with the strength of agreement was the type of patient. The majority of studies that had strong agreement were based on orthopaedic patients suggesting patient characteristics or the type of intervention (e.g. elective surgery rather than medical treatment) may influence the relationship.
There was no consistency in the direction of any difference between retrospective and contemporary accounts. One study found that patients tend to recall better baseline health than what they reported in their contemporary PROMs [17], two studies reported the opposite [13,15], one found it varied by PROM [16] and two found no difference. [12,14] The strength of agreement may be limited if the test-retest reliability of the PROM is poor. In Table 4 the reliability estimates for all the measures that were included in studies in Table 3 are presented. Test-retest reliability for all the PROMs used were excellent, and higher than the agreements captured when comparing retrospective to contemporary PROMs. This

Comparison of retrospective PROMs with population norms
There were 11 studies (Table 5), four from North America [25,26,31,32], four from Australia or New Zealand [29][30][31]36] and three from Europe. [27,33,34] Eight studies involved fewer than 500 patients (86-472) but three were larger (1500-3000 patients). All the studies involved trauma patients apart from one on patients with acute lung injury [31]. Most studies included adults of all ages. The two exceptions were a study of elderly people who had suffered a fractured neck of femur [27] and a study of young adult trauma victims [28].
All reported on a generic PROM: six used a version of the Short Form (SF-36, SF-12, SF-6); three used the EuroQuol EQ-5D; and two used the Sickness Impact Profile. The time period for retrospective reporting in six studies was less than one week. [26][27][28][29][30][31]33] In the other studies it extended from a few weeks to three months.
All but one study used population norms derived from statutory surveys of the general population. The exception used a matched comparison group drawn from the local community [32]. Also, one study of drivers who had suffered trauma in road accidents were compared not only with population norms but also with a sample of uninjured drivers [28].
Of the 10 studies that used general population norms, six found that patients recalled their health as having been better than the general population [28][29][30][33][34][35]. In the four other studies, three found no difference [25][26][27] and in only one did patients report worse health than the general population [31]. The latter was the only study not focused on trauma patients but on those who had developed acute lung injury who were likely to have been in a poor state of health before being hospitalised. The two studies that compared patients with matched samples rather than the general population reported either no difference [28] or better recalled health [32].

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
14 >>insert Table 5: Studies comparing retrospective PROMs with age-sex standardised general population norms<< 1 also compared with representative sample of drivers; 2 compared with 177 community controls; 3 mean difference; p value M A N U S C R I P T

Comparison of retrospective and contemporary PROMs
Only six studies have compared retrospective and contemporary PROMs. While the majority of the comparisons (21 of 30) revealed a strong or very strong association (correlation coefficients of over 0.68), the rest were moderate. Levels of agreement for continuous measures were more consistent with 20 out of 24 comparisons being strong or very strong.
In contrast, comparisons of categorical measures showed only fair to moderate agreement.
Stronger associations were observed for indices (than for individual items), for shorter time periods (one month or less) and for elective surgery patients than for those with medical conditions or treatments. The direction of differences between retrospective and contemporary PROMs also showed no consistent pattern and appeared to be dependent partly on the PROM being used.
Retrospective PROMs may be influenced for three reasons: recall bias; response shift; and lack of validity of the PROM. Recall bias arises because: details may go unnoticed and never be stored; new information may be added to stored memories altering the details; and over time events may be systematically distorted. [6] Recall is influenced by the time interval between the event and the time of its assessment: the longer the interval, the higher the probability of recall bias [37]: 20% of details of an event have been found to be irretrievable after one year and 50% are irretrievable after five years [38].
Response shift refers to the change in perception that can occur when circumstances change [39,40]. For example, a patient's perception of the severity of a disability or their quality of life may change following treatment. This tends to diminish the assessment of pretreatment severity and thus underestimate the benefits of the treatment. An example of this is when the term 'severe', has a different meaning for the same person in one occasion compared with a previous occasion due to new experiences. This is known as recalibration.
Moreover, subjective values may also change over time so that physical, social and psychological aspects of HRQL may be prioritised differently after certain experiences, known as reprioritisation. Patients may also redefine the construct in question and attribute new meanings to it, known as scale reconceptualization [41].
It is possible that the validity of PROMs will be jeopardised when determining retrospective health if the recall interval is lengthy. Most PROMs have been validated for the recall of a person's health over the recent past (between one day to past four weeks). Indeed, many PROMs are based on patients' reports of their health over the preceding few weeks.

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
16 However, if patients are required to recall their health for longer periods, the validity of the instrument cannot be assumed.
For comparisons of health care providers or over time, recall bias and response shift will only matter if there is a systematic difference in behaviour between groups of patients being compared (e.g. patients attending different hospitals). There is no evidence that such differences exist within countries though some differences have been demonstrated between countries [42].

Comparisons of retrospective PROMs and population norms
The studies comparing retrospective PROMs with population norms was inevitably limited to generic instruments because disease-specific PROMs are rarely collected in general population surveys, and hence limits the availability of population data to generic PROMs.
The generalizability of the findings is further limited by the focus of all but one study on trauma victims. The finding that most studies observed that trauma patients recalled their pre-injury health as better than average may reflect that patients (mostly car drivers) are fitter and healthier than the general population. [19] While response shift may have contributed, the likelihood that trauma patients were healthier is supported by evidence that rates of sports injuries and gunshot wounds are higher in fitter members of the population.
[ [29][30][31]33] This difference is further exaggerated as national population norms are derived from household surveys that include institutionalised individuals. In contrast, the one study of elderly people suffering a stress fracture related to poor bone density found no difference from the general population (age-sex standardised). [27] This is also consistent with the one study in which patients recalled worse health than the general population which focused on acute lung injury [31].
There may be a case for the purposes of estimating pre-event health status that estimates could be adjusted for the presence of long-term conditions to reduce over-estimation. The findings also suggest the potential of underestimating the prior health of patients if population norms are used directly as surrogates in cases where the patient population involved are younger adults. However, this underestimation may be small and may mostly affect studies in this specific cohort of patients. .

Limitations
There are several limitations to consider. First, only one author (EK) carried out the search, America (four of six). Thus the generalizability of the findings must be treated with caution.
Third, many of the studies that investigated retrospective recall were too small to perform subgroup analysis to take into account of clinical characteristics such as severity of illness.
Finally, the generalizability of the comparisons of retrospective PROMs and population norms are even more limited with 10 of the 11 studies focused on trauma patients. In addition, only generic PROMs were considered but this is understandable given that population norms are not available for disease-specific PROMs.

Implications for policy and research
Making judgements as to which of contemporary and retrospective reports is the more valid is unclear. Contemporary reports are usually considered the 'gold standard' so if retrospective reports differ, it is the latter that are judged to be 'unreliable'. However, in the context of PROMs, from a patient's point of view the way they recall their previous health may be of greater relevance to them and to assessing the quality of health care than how patients actually assessed it at the time. In this situation, the retrospective report could be viewed as the 'gold standard'. Rather than attach different values to the two types of PROM (in other words, judging whether contemporaneous collection is more or less valid than recalled collection), it is best just to consider the extent to which they differ and the implications both for the use of PROMs in clinical management and in provider comparisons.
As long as data are collected in the same way in different providers then comparisons will not be undermined.     AUA Symptom Index r = 0.92 [22] IKDC subjective Form ICC=0.85 to 0.99 [23] ACL-QOL Standard error of measurement (SEM.) is 6% [24] WOMET ICC=0.79 [14] KOOS ICC =0.75-0.93 [14] M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT