| | Does questionnaire structure influence response in postal surveys?Received 17 January 2002; received in revised form 2 October 2002; accepted 16 October 2002. Abstract This study tested the effect of questionnaire structure on response, speed of return, and content of answers in a postal survey. All 259 patients aged 30–59 years who consulted with back pain at four UK general practices from March to June 2001 were randomly allocated to receive either a traditionally or chronologically structured self-completion questionnaire. The response was higher and the returns quicker (P = .05) for the chronologic questionnaire. There were no statistically significant differences in completion rates or scores on the SF-36, Chronic Pain Grade, Hospital Anxiety and Depression Scale, or Roland-Morris Disability Questionnaire between the two types of questionnaire, and test-retest reliability was high for all scales. Changing questionnaire structure to make questions chronologic does not substantially affect the answers given, but may make a questionnaire more acceptable and easier to complete and speed up returns.
1. Introduction  Survey results from a questionnaire may be vulnerable to the structure of the questionnaire, in particular, the order of the questions 1, 2. It is commonly suggested that questionnaires should start with simple, nonthreatening closed questions, and that questions on particular topics should be grouped 1, 3, 4. The conventional suggestion for questionnaire order in health research places the generic or broader questions first, followed by disease-specific questions [3]. There is little evidence to support these or more general guidelines for self-completion questionnaires in epidemiologic or health-related research. Much of the work looking at questionnaire structure has focused on interviewer-administered questionnaires. Three interview studies looking at child behavior [5], mental health [6], and worries [7] found some effects of the order of questions on responses. Another study [8] investigated question order effects in an economic evaluation, and found no evidence of an effect in telephone interviews. In interviewer administered questionnaires the question order can be easily controlled, and therefore, effects can be assessed. However, in self-completion questionnaires the respondents can read through the whole questionnaire before beginning to answer, and the presence (or absence) of the effect of questionnaire structure is therefore more difficult to ascertain 2, 9. Schwarz [10] reviewed the effect of questionnaire structure on self-reports of attitudes and behaviors, and reported that question order is important. However, self-reports of health and health behaviors may be more or less susceptible to questionnaire structure than studies of attitudes and behaviors. Some work has been carried out with health surveys—two studies have shown that items recording weight loss [11] and medicine use [12] were dependent on order. Little work has looked at the ordering of whole questionnaire sets, although one group [13] carried out a study examining the ordering of different ratings scales, such as disease-specific and overall health status questionnaires, and found no effect. Bowling et al. [9] hypothesized that question order effects might be responsible for different population norms for the Short-Form 36 questionnaire (SF-36) [14], and therefore recommended that the SF-36 be placed first in a questionnaire. If it can be shown that the questionnaire structure has little or no effect on the content of the answers, it may be more appropriate in self-completion questionnaires to aim to increase ease and acceptability for the participants rather than following a conventionally defined order. This may also improve completion and response. A chronologic questionnaire structure reflects more closely standard medical history taking, and may therefore be easier for participants to comprehend. Informal prepilot work confirmed this, and also indicated that a forwards chronologic structure (rather than starting with the present time and working back) would be easiest to understand. The aim of this study was to assess the effect of questionnaire structure by randomly assigning participants to receive either a “traditionally” ordered questionnaire or a chronologic questionnaire.
2. Patients and methods  This study is part of the pilot phase of a longitudinal study looking at the pain, disability, and health care use of a cohort of back pain consulters from primary care. The North Staffordshire Research Ethics Committee approved this study. The aims of this work are to assess differences in speed of returns and response, and differences in scores and reliability of various scales between a traditionally ordered questionnaire and a chronologically ordered questionnaire. 2.1. Subjects The patients for this study were all patients aged 30–59 years consulting with back pain at one of four general practices during the 4 months prior to the sampling (March to June 2001). Computerized primary care records in the UK are coded using the Read Code classification system, and specific codes were used to identify back pain consulters. The list of patients was checked by staff at each general practice to exclude patients that they considered to be inappropriate (such as those with terminal illness) prior to mailing. No patients were excluded. Patient lists were also checked before each mailing for deaths and departures (patients leaving the area). 2.2. Questionnaires There were four main measures used in the questionnaire. The Short-Form 36 questionnaire Version 2 (SF-36v2TM) 14, 15 is a commonly used health status instrument, and is divided into eight dimensions, each scored from zero (worst health status) to 100 (best health status). The Chronic Pain Grade [16] classifies individuals into one of five grades of chronic pain, specifically chronic back pain in this study. The Roland-Morris Disability Questionnaire [17] measures self-reported disability from back pain; scores range from zero (no disability) to 24 (highest disability). The Hospital Anxiety and Depression Scale [18] is used to assess psychologic status, and provides scores for anxiety and depression from zero (lowest) to 21 (highest psychologic distress) for each. General demographic questions, sections on use of health care, medication and self-care, and questions on back pain in the previous 2 weeks were also included. For this study two different questionnaires were developed. A “traditionally” structured questionnaire was developed, using guidelines such as those by Bowling [3], where generic or broader questionnaire instruments are placed first followed by disease-specific instruments. The structural integrity of all the questionnaires was preserved. A “chronologic” questionnaire was also developed, where individual questions are arranged in sections according to the period of time that they ask about. This approach involved breaking up previously validated questionnaires such as the SF-36 and Chronic Pain Grade into individual questions. The order of questions in the questionnaires is shown in Table 1. | | |  | | Chronological | | Traditional |  |
 | 1 | Last 6 months | Chronic pain grade | SF-36 (complete) |  |
 | 2 | Last 4 weeks | SF-36 role limitation—physical SF-36 role limitation—emotional SF-36 social functioning SF-36 energy/vitality SF-36 mental health SF-36 bodily pain Health service use | Hospital anxiety and depression scale |  |
 | 3 | Last 2 weeks | Back pain Medication and self-care | Chronic pain grade |  |
 | 4 | Last week | Hospital anxiety and depression scale | Health service use |  |
 | 5 | A typical day | SF-36 physical functioning | Medication and self-care |  |
 | 6 | Today | Roland-Morris disability questionnaire | Back pain |  |
 | 7 | No time frame | Chronic pain grade—back pain now SF-36 general health perception | Roland-Morris disability questionnaire |  |
 | 8 | | General demographic questions | General demographic questions |  | | | |
2.3. Mailing procedure Participants were randomly assigned to one of two groups, to receive the chronologic questionnaire or the traditionally ordered questionnaire. Questionnaires were sent out with a letter from their GP, an information sheet about the study and a reply paid envelope. All nonresponders 2 weeks after baseline were sent a reminder postcard, and 2 weeks after this (4 weeks after baseline) any remaining nonresponders were sent a reminder questionnaire. Repeatability of each questionnaire was studied in the subset of all those who were early responders (first 2 weeks) to the first mailing. All early responders to the traditional questionnaire were sent an identical repeat (traditional) questionnaire; half of the early responders to the chronologic questionnaire were sent an identical repeat (chronologic) questionnaire. All repeat questionnaires were sent 2 weeks after the original questionnaire. 2.4. Sample size The sample size calculation was based on what we considered to be a meaningful difference in SF-36 Physical Functioning scores from the mean expected for back pain patients (60 on the 0–100 scale [19]). A difference of 10 points is equivalent to a move from not limited at all to limited a lot on one of the 10 areas covered on the scale. To detect this difference of 10 points between the two questionnaires if there was one, 50 completed traditional questionnaires and 100 completed chronologic questionnaires were aimed for. This was based on a standard deviation of the SF-36 physical function dimension of 20 15, 20, and assuming an α of 0.05 and β of 0.2. A two-to-one ratio between the groups was selected as half of the patients receiving the chronologic questionnaire were entered into a separate part of the pilot study to assess a shorter monthly questionnaire (not reported here). 2.5. Data analysis Responses were logged and data was entered into Microsoft Access® 97. Analysis was carried out using SPSS for Windows 10.0 [21], and was blind to questionnaire format. Scores for each questionnaire were calculated according to the methods specified by the questionnaire developers. Differences in response, including length of time to response, between the two questionnaire formats, was assessed using Kaplan-Meier curves and the log rank test. Chronic Pain Grade Scores in the two questionnaires were compared using the chi-square test for trend. Mean scores for the SF-36 dimensions, the Hospital Anxiety and Depression Scale, and the Roland-Morris Disability Questionnaire were compared between the two questionnaires using independent sample t-tests at the 5% significance level. Test-retest reliability was calculated for the SF-36, Hospital Anxiety and Depression Scale, and Roland-Morris Disability Questionnaire using the Intraclass Correlation Coefficient [ICC (2,1)] reliability statistic [22], and for the Chronic Pain Grade using weighted kappa. The ICC and kappa range from zero to one, with one indicating perfect reliability. One sided lower 95% confidence limits were produced for the ICCs to show the minimum level of reliability likely for each dimension.
3. Results  Two hundred fifty-nine patients were selected to take part in the study; 43% of the total sample were male (n = 111) and the mean age of the sample was 45 years (standard deviation (SD) 8.1, median 45). After randomization, 175 patients were sent the chronologic questionnaire and 84 were sent the traditional questionnaire; the groups did not differ significantly in terms of age and gender. A total of 184 baseline questionnaires were returned completed. Five patients were excluded due to nondelivery or incorrect respondent filling in the questionnaire, giving an adjusted response rate of 72%. The response was approximately the same for all four practices (70 to 77%) and for men and women (70 and 74%). The response was slightly higher for older people, with 78% of 45–59 year olds returning the completed baseline questionnaire compared to 66% of 30–44 year olds. Fifty-two questionnaires were sent out in the repeatability study, 31 chronologic and 21 traditional; 62% were returned completed (18 chronologic and 14 traditional). 3.1. Comparison of response The response was higher for the chronological questionnaire (75%, n = 129) than for the traditional questionnaire (67%, n = 55). The median interval between mailing and return for responders was 13 days for the chronologic questionnaire and 21 days for the traditional questionnaire. The speed of response is shown in the form of Kaplan-Meier curves in Figure 1. The corresponding log rank test showed a difference in the rate of response between the two groups (P = .05). There were no statistically significant differences in the response between the two questionnaire formats when stratified by practice, gender or age group. Proportions of missing data were very low for all questionnaire instruments, the highest over both questionnaires was 5% missing. There were no apparent differences in the proportion of missing scores between the two questionnaire formats (see Table 2, Table 3, Table 4). | | |  | | | Chrono- logical question- naire | Traditional question- naire |  |
|---|
 | Chronic pain grade | n | % | n | % |  |
 | 0 | Pain free | 3 | 2.4% | 1 | 1.9% |  |
 | I | Low disability, low intensity | 26 | 21.1% | 4 | 7.5% |  |
 | II | Low disability, high intensity | 18 | 14.5% | 18 | 34.0% |  |
 | III | High disability, moderately limiting | 26 | 21.0% | 12 | 22.6% |  |
 | IV | High disability, severely limiting | 51 | 41.1% | 18 | 34.0% |  |
 | | Missing scores | 5 | 3.9% | 2 | 3.6% |  | | | |
3.2. Comparison of questionnaire scores The responses to the Chronic Pain Grade questionnaire are shown in Table 2. There was no statistically significant difference between the scores on the two questionnaire formats (P = .93). The mean SF-36 scores are shown in Table 3. No dimension had a mean difference of greater than nine points, and there were no statistically significant differences between the two questionnaires. Six of the SF-36 dimensions refer to the last 4 weeks; five scored slightly higher on the traditional questionnaire, and one scored lower (Bodily Pain). The other two dimensions have no specific time frame; one is Physical Function, which scored higher on the traditional questionnaire, and the other is General Health Perception, which scored lower. The scores on the Roland-Morris Disability Questionnaire were mean 7.98 (SD 6.94), median 6.00 on the chronologic questionnaire and mean 7.35 (SD 5.81), median 5.00 on the traditional questionnaire, these scores were not significantly different. The mean scores on the Hospital Anxiety and Depression Scale are shown in Table 4. These figures show that there is no statistically significant difference in the Hospital Anxiety and Depression Scale scores between the two questionnaire formats.
4. Discussion  This study was designed to test whether the structure of a postal self-completion questionnaire had any effect on response. It was found that a chronologically structured questionnaire had significantly quicker returns and higher response than a traditionally structured questionnaire; questionnaire structure did not significantly affect individual scale scores or test-retest reliability (which was high for both questionnaires). The response in this study (72%) was good; von Korff et al. [16] had the same response with primary care low back pain patients in the United States. The question noncompletion rates in the study are low, and are similar to those found in other studies 24, 25. It has been shown here that there were no significant differences between questionnaire scores and question completion rates when individual questions from standard instruments were separated and reordered, compared to using a conventional questionnaire structure. Comparison of these data with other studies shows that the scores on the SF-36 and the Roland-Morris Disability Questionnaire scores are very similar to those found in other back pain patients 19, 2, 28. The test-retest reliability results indicate that both instruments are relatively stable and reproducible. This gives further evidence to the assertion that the results found here are valid and reliable, and changing questionnaire structure did not alter the way that questions are answered. There are some possible explanations for the quicker and higher response to the chronologic questionnaire. First, the chronologic questionnaire may be more familiar to patients, as it was based on the general style of questioning in a medical consultation (starting with when the problem started and working forward to how the patient feels “right now”). This may put the patient at ease and enable them to fill in the questionnaire quicker and more easily, an explanation that was suggested in the prepilot interviews. This contrasts with the traditionally structured questionnaire that contained sections relating to different aspects of health, a concept that may be relatively abstract to a patient. Another reason for the different response may relate to the type of questions that the questionnaire starts with. The chronologic questionnaire started with questions about back pain, which may reemphasize to the respondent that the researchers want to know about them and their own health problems. The traditional questionnaire, however, starts with general health questions, which may make respondents feel that it does not relate to them specifically. Speed of return is important in studies where participants are sent repeated questionnaires; it also has cost and administrative implications when reminders are sent out to nonresponders. This study has shown that altering questionnaire structure by reordering questions can increase speed of return. The proportion responding was also improved with the chronologic questionnaire; any increase in response can be important in epidemiologic studies where generalisability is an issue, and studies to investigate this further would be useful to epidemiologists and survey researchers. No statistically significant differences in any instrument scores were found between the two questionnaire formats. An increased sample size would have led to some significant differences. However, the differences in means found (six or less for most of the SF-36 dimensions) are unlikely to be meaningful. The size of difference which is considered “meaningful” depends upon the scale and the number of values for which it is possible for a subject to score on that scale. A difference of 8 on the role emotional scale, for example, only equates to a difference of one category (e.g., having problems with work or other daily activities most of the time rather than some of the time) for one of the three questions included on that dimension. Even though the differences found may not be meaningful, they raise enough concern for us to suggest that when changes or differences in scores are measured, the questionnaire layout should be held constant. On the Roland-Morris Disability Questionnaire, changes of two to five points have been considered important 29, 30; this magnitude of difference was not found in this study. Common guidelines to questionnaire construction suggest that the order may have an effect on the question responses 1, 2, 3, 4, although no evidence of this was found in this study. This disparity between the guidelines and these results might be because the guidelines commonly published in books are predominantly aimed at interviewer-based studies (where question order is strictly controlled) rather than postal surveys (where the respondent can complete questions in any order). The current study indicates that guidelines for interviews may not be appropriate for self-completion questionnaires, and further work is needed to guide questionnaire structure in postal surveys. One limitation of this study is that only four previously validated questionnaires were used (SF-36, Roland-Morris Disability Questionnaire, Hospital Anxiety and Depression Scale, Chronic Pain Grade), and the effect of questionnaire structure may be different with other questionnaires. It is also possible that this patient group (back pain consulters) responded differently from other groups of people. Further research with different questionnaires, different patient groups and different styles of questionnaire structure may help to elucidate the subject. The results of this study indicate that altering questionnaire structure to make it more acceptable and understandable to recipients, using a chronologic question order, may improve the response to postal surveys. This is an important consideration for epidemiologists and survey researchers when designing self-completion questionnaires.26, 27 Acknowledgements  This work is supported by a research grant from the Wellcome Trust. Thanks to the anonymous reviewers for their constructive comments. References  1.
1
Schuman H, Presser S.
Questions and answers in attitude surveys. New York: Academic Press, Inc.;; 1981;. 2.
2
Sudman S, Bradburn NM.
Asking questions. A practical guide to questionnaire design. San Fransisco: Jossey-Bass Inc.;; 1983;. 3.
3
Bowling A.
Research methods in health. Buckingham: Open University Press;; 1997;. 4.
4
Dillman DA.
Mail and telephone surveys. New York: John Wiley & Sons, Inc.;; 1978;. 5.
5
Lucas CP.
The order effect (reflections on the validity of multiple test presentations).
Psychol Med. 1992;22:197–202. MEDLINE |
CrossRef
6.
6
Jensen PS, Watanabe HK, Richters JE.
Who's up first? Testing for order effects in structured interviews using a counterbalanced experimental design.
J Abnorm Child Psychol. 1999;27:439–445. MEDLINE |
CrossRef
7.
7
Laberge M, Fournier S, Freeston MH, Ladouceur R, Provencher MD.
Structured and free-recall measures of worry themes (effect of order of presentation on worry report).
J Anxiety Disord. 2000;14:429–436. MEDLINE |
CrossRef
8.
8
Kartman B, Stalhammar NO, Johannesson M.
Valuation of health changes with the contingent valuation method (a test of scope and question order effects).
Health Econ. 1996;5:531–541. MEDLINE |
CrossRef
9.
9
Bowling A, Bond M, Jenkinson C, Lamping DL.
Short Form 36 (SF-36) Health Survey questionnaire (which normative data should be used? Comparisons between the norms provided by the Omnibus Survey in Britain, the Health Survey for England and the Oxford Healthy Life Survey).
J Public Health Med. 1999;21:255–270. MEDLINE 10.
10
Schwarz N.
Self-reports (how the questions shape the answers).
Am Psychol. 1999;54(2):93–105.
CrossRef
11.
11
Serdula MK, Mokdad AH, Pamuk ER, Williamson DF, Byers T.
Effects of question order on estimates of the prevalence of attempted weight loss.
Am J Epidemiol. 1995;142:64–67. MEDLINE 12.
12
Gmel G.
Changes in order of questions on use of medicines in the Swiss Health Surveys (does order of questions affect prevalence estimates?).
Soz Praventivmed. 1999;44:126–136. MEDLINE |
CrossRef
13.
13
Barry MJ, Walker-Corkery E, Chang Y, Tyll LT, Cherkin DC, Fowler FJ.
Measurement of overall and disease-specific health status (does the order of questionnaires make a difference?).
J Health Serv Res Policy. 1996;1:20–27. MEDLINE 14.
14
Ware JE.
SF-36 Health Survey Update.
Spine. 2000;25:3130–3139. MEDLINE |
CrossRef
15.
15
Jenkinson C, Stewart-Brown S, Petersen S, Paice C.
Assessment of the SF-36 version 2 in the United Kingdom.
J Epidemiol Community Health. 1999;53:46–50.
CrossRef
16.
16
Von Korff M, Ormel J, Keefe FJ, Dworkin SF.
Grading the severity of chronic pain.
Pain. 1992;50:133–149. Abstract |
Full-Text PDF (2161 KB)
|
CrossRef
17.
17
Roland M, Morris R.
A study of the natural history of back pain. Part I (development of a reliable and sensitive measure of disability in low-back pain).
Spine. 1983;8:141–144. MEDLINE |
CrossRef
18.
18
Zigmond AS, Snaith RP.
The Hospital Anxiety and Depression Scale.
Acta Psychiatr Scand. 1983;67:361–370.
CrossRef
19.
19
Garratt AM, Ruta DA, Abdalla MI, Russell IT.
SF 36 health survey questionnaire (II. Responsiveness to changes in health status in four common clinical conditions).
Qual Health Care. 1994;3:186–192. MEDLINE 20.
20
Julious SA, George S, Campbell MJ.
Sample sizes for studies using the short form 36 (SF-36).
J Epidemiol Community Health. 1995;49:642–644.
CrossRef
21.
21
SPSS for Windows .
Rel. 10.0.7. Chicago: SPSS Inc.;; 2000;. 22.
22
Shrout PE, Fleiss JL.
Intraclass corrlelations (uses in assessing rater reliability).
Psychol Bull. 1979;86:420–428.
CrossRef
23.
23
Shrout PE.
Measurement reliability and agreement in psychiatry.
Stat Methods Med Res. 1998;7:301–317. MEDLINE |
CrossRef
24.
24
Elliott AM.
Chronic pain in the community (its prevalence, impact and natural history). Aberdeen, Scottland: Aberdeen University;; 2000;. 25.
25
Brazier JE, Harper R, Jones NM, et al.
Validating the SF-36 health survey questionnaire (new outcome measure for primary care).
BMJ. 1992;305:160–164. 26.
26
Ren XS, Selim AJ, Fincke G, et al.
Assessment of functional status, low back disability, and use of diagnostic imaging in patients with low back pain and radiating leg pain.
J Clin Epidemiol. 1999;52:1063–1071. Abstract | Full Text |
Full-Text PDF (271 KB)
|
CrossRef
27.
27
Suarez-Almazor ME, Kendall C, Johnson JA, Skeith K, Vincent D.
Use of health status measures in patients with low back pain in clinical settings. Comparison of specific, generic and preference-based instruments.
Rheumatology. 2000;39:783–790. MEDLINE |
CrossRef
28.
28
Bronfort G, Bouter LM.
Responsiveness of general health status in chronic low back pain (a comparison of the COOP charts and the SF-36).
Pain. 1999;83:201–209. Abstract | Full Text |
Full-Text PDF (112 KB)
|
CrossRef
29.
29
Stratford PW, Binkley JM.
Applying the results of self-report measures to individual patients (an example using the Roland-Morris Questionnaire).
J Orthop Sports Phys Ther. 1999;29:232–239. MEDLINE 30.
30
Bombardier C, Hayden J, Beaton DE.
Minimal clinically important difference. Low back pain (outcome measures).
J Rheumatol. 2001;28:431–438. Keele University, Primary Care Sciences Research Centre, Hornbeam Building, Keele, Staffordshire ST5 5BG, UK Corresponding author. Tel.: +44 (0) 1782-583926; fax: +44 (0) 1782-583911.
PII: S0895-4356(02)00567-X © 2003 Elsevier Science Inc. All rights reserved. | |
|