| | Age at diagnosis and the choice of survival analysis methods in cancer epidemiologyReceived 14 January 2002; received in revised form 30 April 2002; accepted 2 September 2002. Abstract A young age at diagnosis of cancer is often seen as an indicator of the aggressiveness of the tumor. However, empirical studies have shown conflicting results on the association between age at diagnosis and survival. There are two choices of time scale for a Cox regression model: time since diagnosis, and age. The regression analysis of relative survival rates is an alternative to the Cox model. Using breast cancer data from a population-based cancer registry, we illustrate the features of Cox models using the two time scales and compare them with the relative survival approach. Using a Cox model with time since diagnosis as the time scale, a younger age at diagnosis is associated with a lower mortality; using age as the time scale gives the opposite result. The relative survival approach agrees with the Cox model with age as the time scale. We maintain that a careful clarification of research purpose and a careful choice of methods are necessary.
1. Introduction  “The influence of age and menopausal status at diagnosis on the prognosis of patients with primary breast cancer remains controversial” [1]. It is quite commonly believed that the more aggressive tumors are, the more likely they can be detected at a young age. Therefore, patients diagnosed with breast cancer at a younger age are often expected to have a poorer survival than older patients diagnosed with the same stage of disease. However, empirical studies have provided conflicting evidence about the relation between age at diagnosis and survival 1, 2. Small sample sizes, differences in selection criteria, and differences in age group classifications have been suggested as the explanations for the inconsistent findings. Two relatively large-scale studies concluded that patients diagnosed with breast cancer prior to the age of 35 years have a worse prognosis and poorer clinical outcomes 3, 4. For instance, it was observed that p53 abnormalities, estrogen receptor negativity, progesterone receptor negativity, and S-phase fractions were more common in breast cancer patients diagnosed before 35 years of age. Nevertheless, a multivariable Cox regression analysis of the breast cancer data from the Surveillance, Epidemiology, and End Results (SEER) Program, which cover about 12% of the population of the United States, showed a positive association between age at diagnosis and cancer mortality. For every 5 years older at diagnosis the risk of cancer death increased by 8% [5]. The Cox regression model has almost become the standard for survival analysis in epidemiology [6]. There are two possible choices for the time scale in a Cox regression model, namely, time-on-study and age 7, 8. Time-on-study usually refers to calendar time from a baseline survey; in cancer epidemiology it is often calendar time since diagnosis. The date of diagnosis is taken as the origin of the survival time scale, that is, time = 0 at diagnosis. The alternative is to use age as the time scale. Because subjects are interviewed or diagnosed at different ages, they are included in the at-risk population only from that age onwards. They are excluded from the risk set prior to that entry age. This is called a “late entry” or “left truncation” [7]. See Korn et al. for a discussion and illustration of using age as the time scale [8]. When age is used as the time scale, an age effect is removed as it is absorbed into the unspecified baseline hazard. Some epidemiologists take time-on-study as a default option; some may be unaware of the choices. Korn et al. maintained that in general age is a better choice of the time scale [8]. On the other hand, regression analysis of relative survival rates offers an alternative method to remove an age effect [9]. This approach uses the age-specific mortality rates in the general population to normalize the mortality rates of cancer patients during follow-up. In this article, we used breast cancer data from SEER to demonstrate that using time since diagnosis as the time scale, a Cox regression model would show a positive association between age at diagnosis and mortality. This is contrary to clinical belief. In contrast, using age as the time scale the Cox regression model showed a negative association between them, and so did a relative survival model. This article aims to illustrate why it is so, and to highlight the implications of using different time scales in the Cox model. We also examined the use of the regression analysis of relative survival rates and demonstrated its utility in studying an effect of age at diagnosis in the context of cancer epidemiology. We maintain that a careful clarification of research purpose and a corresponding choice of methods are important. Otherwise, one may give the right answer to the wrong question. As will be shown in the following sections, the key is to separate an effect of age at diagnosis from an effect of “age during follow-up.” For brevity, the term “age effect” refers to the latter if the context is clear.
2. Materials and methods  Suppose the first death in a study occurs to subject i = 1 at time t1 = 5, who was diagnosed at an age over 60 (xi = 1; xi = 0 if age at diagnosis ⩽ 60). The Cox model considers the partial likelihood (PL) contributed by this death as where λ 0(5) is the baseline hazard at t = 5, yi = 1 if subject i is at risk at this time, yi = 0 otherwise, xi is the value of the dummy variable for age at diagnosis, i = 1, 2, … , n, and n is the sample size 6, 7. When time since diagnosis is used to define the time scale, a subject is not at risk ( yi = 0) because she has either died or been censored by this time. However, when age is used to define the time scale, a subject may not be at risk because at that age she has not been diagnosed with cancer. In this case, yi = 1 only if the failure takes place between the age at diagnosis and age at death/censoring of subject i. Based on the proportional hazards assumption, the term λ 0(5) or generally λ 0( t) in the above formulae is cancelled out. So if time since diagnosis is used to define the time scale t, the effect of this factor is removed by the Cox model by being absorbed into the unspecific baseline hazard function λ 0( t). If age is used as the time scale, an age effect is removed. In this context, age refers to patients' age during the follow-up period, not their age at diagnosis. The yi in the denominator of the partial likelihood conveniently allows for late entry. The regression analysis of relative survival rates of cancer patients proposed by Hakulinen and Tenkanen 9, 10 has it that where pki is the observed survival rate of patients in group k who are at risk during the ith time interval (usually year) since diagnosis, pki* is the expected survival rate of this group during this time interval calculated using the age-specific mortality rates of a comparable general population, b0 and bjs are regression coefficients to be estimated, j = 1, 2, … , J, and J is the total number of covariates. In the context of this article, the groups consist of patients with different ages at diagnosis; pki* is the answer to the question “what would be the rate of survival for this group in this time interval since diagnosis if they had a mortality rate of people at the same age in the general population.” Rearranging equation (2) gives Equation (3) can be estimated by any statistical software for generalized linear models that allow users to define a separate complementary log–log link function with a division by pki* [the left-hand side of equation (3)] for each observation of group k at time i. Examples of such programs include SAS and S-Plus [10]. The exponentiated value of the coefficient bj can be interpreted as a relative risk (RR). Details of this regression method and the calculation of expected survival rates pki* can be found elsewhere 9, 10. We used the mortality rates of white women in the United States in 1991 as the reference in calculating pki* [11]. The SEER Program of the National Cancer Institute has collected information on all cancer cases in residents of nine regions in the United States since 1973. The population covered by SEER made up about 12% of the national population and had a demographic profile similar to that of the whole country 12, 13. We only used the first record of breast cancer of each subject. Subsequent records from the same subjects were excluded. For the purpose of illustration, we attempted to make the samples in our analysis as homogenous as possible. Only non-Hispanic white women aged 20 or above were included. Disease stage at diagnosis is an important factor. Not until 1988 did SEER include the American Joint Committee on Cancer classification for disease stages at the time of diagnosis. For the purpose of graphical illustration we need a sample that has been followed for a long period. The analysis included patients diagnosed with stage I breast cancer in 1988 and 1989. The mortality of the subjects had been monitored for at least 9 to 10 years. The choice of 2 years' data represented an attempt to give a balance between a reasonably large sample size and a sufficiently long follow-up period; the choice of stage I cancer was an attempt to have a large amount of data. In graphic presentations, the subjects were classified into 5-year cohorts of age at diagnosis, for example, diagnosed at 35–39, 40–44, etc. Their follow-up time was split into person-year records [14]. For instance, a subject diagnosed in January 1988 at the age of 40 and died in December 1990 would have three person-year records, with age during follow-up being 40, 41, and 42. Her age at diagnosis was fixed at 40 for all three records. Within each cohort Poisson regression was used to smooth the relation between mortality during the follow-up period and age. To avoid having too much random variation in the graphs, only cohorts with at least 20 deaths during the follow-up study period are included. Cohorts diagnosed before the age of 35 are therefore excluded. If young age is a biologic marker of tumor aggressiveness, exclusion of the age groups diagnosed at ages below 35 (due to small number of deaths) means that this effect would be underestimated. We emphasize that the purpose of this article is to illustrate the methodologic issues only. The resultant age-specific mortality curves for each cohort of age at diagnosis were plotted using different time scales, that is, age and time since diagnosis. We then contrasted the findings from Cox models using different time scales and a relative survival model. Age at diagnosis in 5-year bands was used, with the youngest 5-year group coded as 1, second group as 2, etc. So the hazard ratios/relative risks refer to the impact of a 5-year difference in age at diagnosis. We fitted models with a linear term for age at diagnosis, with and without adding a quadratic term. A quadratic term was removed if it was insignificant (P > .05; likelihood ratio test). SAS was used for estimating the Cox regression models and regression models of relative survival rates 7, 10. SURV2 was used for computing expected survival rates [10].
3. Results  There were 8,355 subjects in the sample. The number of all-cause deaths was 1,889. Table 1 shows the age-specific overall mortality smoothed by Poisson regression by cohorts of age at diagnosis. The same data are plotted in Figure 1 to facilitate the illustration. | | |  | Age during follow-up | Age at diagnosis cohorts |  |
|---|
 | | 35–39 | 40–44 | 45–49 | 50–54 | 55–59 | 60–64 | 65–69 | 70–74 | 75–79 | 80–84 | 85–89 |  |
 | 37 | 10.5 | | | | | | | | | | |  |
 | 42 | 12.8 | 7.6 | | | | | | | | | |  |
 | 47 | 15.5 | 9.9 | 5.1 | | | | | | | | |  |
 | 52 | | 12.8 | 7.8 | 5.0 | | | | | | | |  |
 | 57 | | | 11.8 | 9.8 | 13.4 | | | | | | |  |
 | 62 | | | | 19.1 | 15.9 | 12.0 | | | | | |  |
 | 67 | | | | | 19.0 | 18.9 | 14.5 | | | | |  |
 | 72 | | | | | | 29.7 | 23.3 | 21.9 | | | |  |
 | 77 | | | | | | | 37.4 | 33.2 | 26.4 | | |  |
 | 82 | | | | | | | | 50.2 | 51.2 | 47.8 | |  |
 | 87 | | | | | | | | | 99.0 | 89.8 | 79.5 |  |
 | 92 | | | | | | | | | | 168.6 | 135.1 |  |
 | 97 | | | | | | | | | | | 229.5 |  |
 | No. of subjects | 277 | 489 | 622 | 710 | 803 | 1139 | 1350 | 1181 | 993 | 576 | 215 |  |
 | No. of death | 29 | 41 | 40 | 58 | 109 | 181 | 260 | 312 | 375 | 328 | 156 |  | | | |
The first obvious finding is that mortality increased with age, both within and among cohorts of age at diagnosis. This is just a reflection of the age pattern of mortality in a general population. That the patients had been diagnosed with stage I breast cancer did not overturn this general pattern. The second obvious thing is that age at diagnosis was positively correlated with age during follow-up. Those diagnosed at older ages were also older during the follow-up period. Third, subjects diagnosed at a younger age had a higher age-specific mortality. For instance, women diagnosed between 35–39 years had a mortality of 12.8 and 15.5 (per 1,000) when they were 42 and 47 years old, respectively (Table 1). The corresponding figures among those diagnosed between 40–44 years were 7.6 and 9.9, respectively. Graphically, this can be seen from the vertical distance between the age-specific mortality curves (Fig. 1). The curves for most cohorts of age at diagnosis were roughly parallel to each other. The vertical distance between the curves appeared to be larger among younger cohorts and smaller among older cohorts. A Cox regression model with age as the time scale estimates an age at diagnosis effect by quantifying the vertical distance between the curves, while absorbing the age (during follow-up) effect into the unspecified baseline hazard. Without adjustment for covariates, it was found that the hazard ratio (95% CI) of age in 5-year bands was 0.78 (0.72–0.84), indicating a negative association between age at diagnosis and mortality. A younger age at diagnosis was associated with a higher mortality. Adding a quadratic term did not improve the model (P > .05). The same data can be rearranged according to time since diagnosis. That is, to shift the entries in each column of Table 1 to the top of the column and change the row labels to 0, 5, and 10 years since diagnosis. Figures 2 plots the curves so arranged. In contrast to Figure 1, the general pattern is that cohorts older at diagnosis tended to have a higher mortality throughout the time-since-diagnosis scale. The youngest cohorts (short dashed lines) are at the bottom, middle-aged cohorts (solid lines) in the middle, and oldest cohorts (long dashed lines) on the top. A Cox model using this time scale with a linear age term was fitted and a hazard ratio (95% CI) of 1.39 (1.36–1.42) was found. Adding a quadratic term significantly improved the model (P < .05), and the hazard ratios for the linear and quadratic terms were 0.74 (0.65–0.85) and 1.03 (1.02–1.04) respectively. They implied that in the first four cohorts the association was negative, after that mortality increased with an older age at diagnosis. With this time scale, the vertical distance between the curves represents a mixture of an age effect and an age at diagnosis effect. The hazard ratio for age at diagnosis will be the net effect of age at diagnosis as a biologic marker of tumor characteristics and age during follow-up. Overall, in this sample the latter dominated and made the hazard ratio in the first model (linear effect only) higher than 1. However, in the younger age range there was some nonlinearity. This is consistent with the hypothesis that the association between age at diagnosis and tumor aggressiveness was stronger in the younger age range 2, 3. So the net effect was more dominated by age at diagnosis as a biologic marker in this age range. It is intuitive to consider using age as the survival time scale to remove the effect of age during follow-up and estimate the effect of age at diagnosis. However, a problem is that given an age during follow-up, age at diagnosis has a linear relation with time since diagnosis. Say, if a patient is 47 years old at a point of the follow-up period and she was diagnosis with breast cancer at age 37, time since diagnosis is 10 years. If there is a linear effect of time since diagnosis on mortality, the effect of age at diagnosis estimated by the Cox model with age as time scale represents a mixture of the effects of age at diagnosis and time since diagnosis. If time since diagnosis has no effect or a quadratic effect, the estimated coefficients for age at diagnosis is not affected. Table 1 and Figure 1 suggest that in this example the age at diagnosis effect is true, not an artefact induced by its relation to time since diagnosis. For example, those diagnosed at 40–44 years had a mortality of 12.8 at age 52 (10 years since diagnosis); those diagnosed at age 35–39 had a mortality of 15.5 at age 47 (also 10 years since diagnosis). Had it not been an age at diagnosis effect, we would expect patients at age 47 to have a lower mortality than patients at age 52, time since diagnosis being the same. In making the above judgement, we have utilized some external information, that is, our knowledge of the age pattern of mortality in general populations. If such external information is to be used, it may be better to use it formally in a quantitative way. Furthermore, in a Cox model there is no way to formally assess whether time since diagnosis has no effect, a linear effect, or a nonlinear effect on mortality. These two issues can be handled by the regression analysis of relative survival rates. Table 2 shows the results of the multivariable regression analysis of relative survival rates. We began with a model that included the linear and quadratic terms for age at diagnosis and time since diagnosis. The quadratic term for age at diagnosis was not significant (P > .05), and was removed from the final model. The relative risk for age at diagnosis was 0.77 (P < .01). This confirms the above findings of the Cox model with age as time scale, which showed a hazard ratio of 0.78. The estimates for the linear and quadratic effects of time since diagnosis imply that relative risk increased initially, reached a peak at about 5 years after diagnosis, and then declined. It explains why the Cox model with age as time scale gave the correct estimate for age at diagnosis: although age at diagnosis was linearly related to time since diagnosis given age during follow-up, the effect of time since diagnosis on mortality was not linear.
4. Discussion  We have shown by graphic means and by Cox regression models the relevancy of the choice of time scale in studies of survival in patients diagnosed with breast cancer. The fact that Cox regression models using different time scales gave opposite findings was explained. As interpretation of the Cox model with age as time scale is not simple, we also performed regression analysis of relative survival rates. To avoid distraction, we have refrained from more detailed analysis such as testing for nonproportional hazards in the Cox models and relative survival model. Furthermore, we do not think that the SEER database is ideal for answering the substantive question about age at diagnosis and outcomes because some important confounders are not observed in the database, for example, patterns of treatment [15]. So the illustration here should not be taken as evidence to support the relation between young age as a marker of tumor aggressiveness and higher mortality. The choice of time scale is an important yet often overlooked aspect in survival analysis. Some previous researchers have proposed that in many epidemiologic studies age is a better choice than time-on-study [8]. It is an indirect way to adjust for an age effect. In the past, the use of age as the time scale in Cox regression was limited by the paucity of readily available software packages that can handle late entry. This is no longer the case, as several general-purpose statistical packages now have the capacity to handle late entry, for example, SAS [7] and Stata [16]. Regression analysis of relative survival rates offers another approach to adjust for an age effect. This approach can also be implemented using general-purpose statistical software such as SAS and S-Plus [10]. In cancer research there has long been an interest in age at diagnosis as an indicator of the biologic features of tumors. Studies of the survival of breast cancer patients have provided conflicting results 1, 2. Age at diagnosis, age during follow-up and time since diagnosis are related to each other. They form a problem similar to that of the “age-period-cohort” problem in epidemiology. When one of them is fixed, the other two have a linear relation. There is no way to simultaneously account for all the three aspects without resorting to external information 14, 17. Because age at diagnosis is positively related to age during follow-up, which in turn is linearly related to mortality (on the log scale in the age range from 20 to 85 [18]), the use of time since diagnosis as the time scale will not show the effect of age at diagnosis as a proxy of tumor aggressiveness. The estimated effect is a mixture of age at diagnosis and age during follow-up, with the latter being dominant in the present example. Using age as the time scale of a Cox model can remove the age effect, but the effect estimate for age at diagnosis may be affected by the effect of time since diagnosis if the latter has a linear relation with mortality. The relative survival approach offers a formal way to adjust for an age effect by using external information, after which it is possible to analysis age at diagnosis and time since diagnosis simultaneously. It is important to give a careful clarification of research purpose and a careful choice of the survival analysis method. A simple statement like “to investigate the prognostic value of age at diagnosis on mortality” is not enough. If age at diagnosis is seen as a predictor of mortality, the use of time since diagnosis for the time scale is sensible. If it is taken as a proxy of tumor aggressiveness, using time since diagnosis as the time-scale in a Cox model is certainly inappropriate. Using age as the time-scale may or may not give the correct estimate depending on whether there is a linear effect of time since diagnosis. The relative survival approach has the advantages of simple interpretation and an ability to estimate both the effects of age at diagnosis and time since diagnosis, but it is achieved at the expense of not estimating the effect of age during follow-up. In the literature, the choice of survival time scale was rarely explicitly stated. Moreover, the purpose of including age at diagnosis as an independent variable was not always given. Therefore, we cannot perform a formal review of the literature to examine the extent to which an effect of age at diagnosis of breast cancer was affected by inappropriate choice of survival time scale. In conclusion, a careful clarification of research purpose and a careful choice of survival analysis methods are important. In an attempt to understand age at diagnosis as a proxy of tumor aggressiveness, the relative survival approach is most appropriate and the Cox model with time since diagnosis as the time scale is least appropriate. The methodologic issue is also relevant to studies of other tumor types or cause-specific mortality. References  1.
1
Clark GM.
Prognostic and predictive factors.
In:
Harris JR, Lippman ME, Morrow M, Osborne CK editor. Diseases of the breast. 2nd ed. Philadelphia: Wolters Kluwer;; 2000;p. 489–514. 2.
2
Holli K, Isola J.
Effect of age on the survival of breast cancer patients.
Eur J Cancer. 1997;33:425–428. Abstract |
Full-Text PDF (498 KB)
|
CrossRef
3.
3
Albain KS, Allred DC, Clark GM.
Breast cancer outcome and predictors of outcome (are there age differentials?).
J Natl Cancer Inst Mongr. 1994;16:35–42. 4.
4
Nixon AJ, Neuberg D, Hayes DF, Gelman R, Connolly JL, Schnitt S, et al.
Relationship of patient age to pathologic features of the tumor and prognosis for patients with stage I or II breast cancer.
J Clin Oncol. 1994;12:888–894. 5.
5
Joslyn SA, West MM.
Racial differences in breast carcinoma survival.
Cancer. 2000;88:114–123. 6.
6
Cox DR.
Regression models and life tables (with discussion).
J R Stat Soc B. 1972;34:187–220. 7.
7
Allison PD.
Survival analysis using the SAS system. Cary, NC: SAS Institute Inc.;; 1995;. 8.
8
Korn EL, Graubard BI, Midthune D.
Time-to-event analysis of longitudinal follow-up of a survey (choice of the time-scale).
Am J Epidemiol. 1997;145:72–80. MEDLINE 9.
9
Hakulinen T, Tenkanen L.
Regression analysis of relative survival rates.
Appl Stat. 1987;36:309–317. 10.
10
Voutilainen ET, Dickman PW, Hakulinen T.
SURV2 (relative survival analysis program software manual). Helsinki: Finnish Cancer Registry;; 2000;. 11.
11
National Center for Health Statistics .
U.S. decennial life tables for 1989–1991, vol 1 no 1. Hyattsville, MD: National Center for Health Statistics;; 1997;. 12.
12
National Cancer Institute. SEER cancer incidence public-use database, 1973–1997. National Cancer Institute CD-ROM # 2, ASCII version. 13.
13
In:
Ries LA, Kosary CL, Hankey BF, Harras A, Miller BA, Edwards BK editor. SEER cancer statistics. review, 1973–1994. Bethesda, MD: National Cancer Institute;; 1997;. 14.
14
Clayton D, Hills M.
Statistical models in epidemiology. Oxford: Oxford University Press;; 1993;. 15.
15
Newschaffer CJ, Penberthy L, Desch CE, Retchin SM, Whittemore M.
The effect of age and comorbidity in the treatment of elderly women with nonmetastatic breast cancer.
Arch Intern Med. 1996;156:85–90. MEDLINE 16.
16
StataCorp. Stata reference manual. Release 6. College Station, TX: Stata Press; 2001. 17.
17
Tilley J.
Political generation and partisanship in the UK, 1964–1997.
J R Stat Soc Series A. 2002;165:121–135. 18.
18
Thatcher AR.
The long-term pattern of adult mortality and the highest attained age.
J R Stat Soc Series A. 1999;162:5–43. a Division of Clinical Trials & Epidemiological Sciences, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610, Singapore b Department of Medical Oncology, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610, Singapore Corresponding author. Tel.: +65-6436 8208; fax: +65-6225 0047.
PII: S0895-4356(02)00536-X © 2003 Elsevier Science Inc. All rights reserved. | |
|