If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Immunology, Genetics and Pathology, Uppsala University Hospital, Uppsala, SwedenRegional Cancer Center, Uppsala University/Uppsala University Hospital, Uppsala, Sweden
Department of Surgical Sciences, Uppsala University, Uppsala, SwedenRegional Cancer Center, Uppsala University/Uppsala University Hospital, Uppsala, Sweden
Choice of imputation method for missing M stage affects the estimated incidence.
•
Setting missing all M stage to M0 results in biased estimate of incidence.
•
Multiple imputation can be used to gain insights about this bias.
Abstract
Objectives
To study how handling missing data on M stage in a clinical cancer register affects estimates of incidence of metastatic prostate cancer.
Study Design and Setting
Estimates of age-standardized incidence of metastatic prostate cancer were obtained by the use of data in a population-based clinical cancer register in Sweden and using four methods for imputation of missing M stage. Adjusted survival was used to compare men with known and imputed M stage.
Results
The proportion of men with missing M stage was high (66%) and varied according to the risk group and over calendar time. The estimated incidence of metastatic disease varied depending on imputation method, with all methods indicating a decreasing incidence over time. A combination of deterministic imputation (DI) and multiple imputation (MI) produced adjusted survival curves for men with imputed M stage that best resembled the survival for men with known M stage.
Conclusions
Plausible estimates of incidence of metastatic prostate cancer in clinical cancer registers can be obtained by the use of a combination of DI of missing M stage and MI.
The estimated incidence of metastatic prostate cancer (M1) varied depending on how missing data on M stage was handled. Simply substituting all missing M stage with M0 underestimated the incidence of M1. Deterministic imputation of missing M stage to M0 among men with low risk of metastases in combination with multiple imputation yielded similar survival comparing men with known and missing M-stage.
What this adds to what is known?
•
Information on serum of prostate-specific antigen levels and Gleason score, in addition to tumor node metastasis stage is necessary to yield plausible estimates when imputing missing M stage.
What is the implication and what should change now?
•
Missing M stage cannot by default be deterministically imputed to M0. Analyses of incidence trends of metastatic prostate cancer should be complemented with sensitivity analyses and information on how missing M stage was handled.
1. Introduction
The incidence of de novo metastatic cancer (i.e., metastatic cancer at diagnosis) is an early proxy for cancer-specific mortality when evaluating interventions such as screening and easier and earlier access to health care. Incidence of de novo metastatic cancer is unaffected by new treatments, and it does not require a long observation time. Missing data on tumor node metastasis (TNM) variables is common [
] and temporal changes in use of imaging can influence the pattern of missingness in M stage. For example, efforts to discourage inappropriate use of bone imaging in men with low-risk prostate cancer in Sweden reduced the proportion of men with low-risk prostate cancer who underwent bone imaging from 45% in 1998 to 3% in 2009 [
]. Missing data may also vary over time due to revised coding principles in cancer staging systems. An example is the removal of the category “Mx” for unknown metastatic status in the seventh edition of the TNM classification, with the result that men who have not undergone bone imaging are now classified as M0 [
]. Trends in the incidence of de novo metastatic cancer may be biased unless missing M stage is handled appropriately because the reasons for missing M stage vary over calendar time and across risk categories [
The aim of the study was to assess statistical methods for estimating the age-standardized incidence of de novo metastatic prostate cancer when M stage is missing for a large proportion of men. The methods used should account for missing data that vary over calendar time and are related to other measured and unmeasured clinical variables.
2. Materials
All men diagnosed with prostate cancer from 2000 to 2019 registered in the National Prostate Cancer Register (NPCR) of Sweden were included [
]. The NPCR includes data on diagnostic work-up, tumor characteristics, and primary treatment. Data linkages in the Prostate Cancer data Base Sweden (PCBaSe) were performed between the NPCR, the National Patient Register, the National Cancer Register, and the National Cause of Death Register by use of the unique Swedish personal identity number [
]. The following variables were extracted from PCBaSe: age at diagnosis, year of diagnosis, serum level of prostate-specific antigen (PSA), clinical TNM stage, Gleason score (GS) of the diagnostic biopsy cores or World Health Organization (WHO) grade in fine needle biopsies, mode of detection (lower urinary tract symptoms, other symptoms, and asymptomatic), primary treatment, Charlson Comorbidity Index (CCI), survival time, and status (cause of death [prostate cancer or other causes] or censoring). Follow-up ended at the time of death or at the end of follow-up (December 31, 2019). Primary treatment was categorized into radical treatment (radical prostatectomy or radiotherapy), androgen deprivation therapy (ADT) (gonadotropin-releasing hormone, antiandrogens [bicalutamide] or orchidectomy), deferred treatment (active surveillance or watchful waiting) and other or unknown treatment (other). The CCI was based on discharge diagnoses, excluding prostate cancer and metastases, from the National Patient Register up to 10 years prior to prostate cancer diagnosis. Data on all men alive each year between the ages of 40 and 100 years were obtained from Statistics Sweden (SCB) [
Men with prostate cancer were categorized according to the risk of metastatic disease at diagnosis:
Low metastatic risk: PSA <20 ng/mL, T1-2, and GS ≤7 or WHO grade 1–2 if GS is missing,
High metastatic risk: PSA ≥ 20 ng/mL, T3-4, GS >7, or WHO grade 3 if GS is missing,
Unknown metastatic risk: if missing any of PSA, T stage, and simultaneously both of GS and WHO grade.
The categorization was designed to closely match the current Swedish clinical guidelines for use of imaging in the diagnostic workup of men with prostate cancer [
We estimated the age-standardized incidence of de novo metastatic prostate cancer according to the age distribution in Sweden 2000 by using direct standardization [
]. To obtain an annual estimate of the proportion of M1 among all men alive in each age strata in the presence of missing data on M stage we used four different methods based on deterministic imputation (DI) and multiple imputation (MI) using the R package mice [
]. The definition of M stage used prior to 2011 was recreated for the whole cohort; i.e., M stage was considered missing if the man had not undergone imaging to assess metastatic status. Adjusted survival curves stratified by M stage were used to compare known and imputed M stage among men with M0 and M1, respectively, and these were obtained by the method of weighting to account for potential differences in baseline characteristics [
]. See Supplementary Materials for further details on the methods, specification of the imputation models, weight diagnostics, and sensitivity analyses.
3.1 Deterministic imputation
M stage was substituted to M0 for all men with missing M stage. This corresponds to a situation where only positive imaging results are registered and imaged men with M0 cannot be differentiated from nonimaged men, as in the current Union for International Cancer Control classification [
When to perform bone scan in patients with newly diagnosed prostate cancer: external validation of the currently available guidelines and proposal of a novel risk stratification tool.
]. M stage was therefore first substituted to M0 for all men categorized as low metastatic risk with missing M stage, and then remaining missing data in M stage and all other variables (e.g., PSA and N stage) was imputed using MI including all variables listed in the Materials section.
3.3 Standard MI
All variables listed in the Material section were included and missing data were imputed using MI. This method corresponds to a standard implementation of MI without any prior deterministic imputation.
3.4 Restricted MI
Many registers contain a limited number of variables used in clinical practice, such as the National Cancer Registry in Sweden that only registers TNM and no other clinical variables or survival data. To simulate this scenario only TNM stage, age, and year of diagnosis were included, and missing data were imputed using MI. Survival data were included in a sensitivity analysis, see Supplementary Materials.
4. Results
4.1 Baseline characteristics
There were 190,420 men diagnosed with prostate cancer between 2000 and 2019 in NPCR. Baseline characteristics by M stage are summarized in Table 1. Of which 126,102 men (66%) had missing M stage; 15,526 men (8%) were M1, constituting 24% of all imaged men. Men with missing M stage had similar characteristics as men with M0 with respect to age at diagnosis, CCI, and mode of detection. The PSA, T stage and GS, however, indicated more favorable disease characteristics in men with missing M stage. Thirty six percent of men with M0 and 3% of men with M1 were categorized as low metastatic risk. The corresponding proportion for men with missing M stage was 70%, and these were substituted to M0 by the DI method and the partial deterministic imputation + MI (PDI + MI) method prior to MI.
Table 1Baseline characteristics of men diagnosed with prostate cancer in PCBaSe between 2000 and 2019 by M stage determined by imaging. Men that did not undergo bone imaging have missing M stage. Column-wise percentages are indicated with () and row-wise percentages are indicated with []
The annual number of men diagnosed with prostate cancer increased during the study period, while the annual number of men categorized as high metastatic risk was stable in all age groups. Simultaneously, the proportion of imaged men (i.e., known M stage) decreased from 48% in 2000 to 23% in 2008. This was followed by an increase to 37% in 2019 which was most pronounced among men aged 70 years or above categorized as high metastatic risk (Supplementary Figure 1).
4.2 Baseline characteristics after imputation
The proportions of men with imputed M1 among men with missing M stage were 7%, 10%, and 16% when applying PDI + MI, standard MI (SMI), and restricted MI (RMI), respectively. Among men with imputed M1, the proportion categorized as low metastatic risk varied substantially (1–40%) depending on the imputation method used (Supplementary Table 1), compared with 4% among men with known M1 (Table 1). When using PDI + MI, men with imputed M1 were older, had higher CCI, fewer were detected through a health checkup, and most men were assigned to primary treatment by ADT compared to other methods for imputation. The tumor characteristics among men with imputed M0 were similar across methods and tended toward more favorable disease characteristics compared to men with known M0 (Supplementary Table 1).
4.3 Incidence of metastatic prostate cancer
The estimated age-standardized incidence of de novo metastatic prostate cancer varied markedly between the four applied methods (Figure 1). The estimated incidences were 43, 70, 74, and 91 per 100,000 men in 2000 for each method DI, PDI + MI, SMI, and RMI, respectively. Both the estimated incidences, as well as the difference in estimated incidences between methods, decreased with time and were 32, 40, 50, and 57 per 100,000 men for DI, PDI + MI, SMI, and RMI, respectively in 2019. However, the estimated incidence curve was u-shaped for DI with a minimum of 26 per 100,000 men in 2012.
Fig. 1Standardized incidence of metastatic prostate cancer (with respect to the age distribution in 2000) by statistical method. M stage was imputed for men with missing M stage, i.e., men that did not undergo bone imaging. Data on use of bone imaging were not registered in NPCR for men diagnosed in 2011. Men registered as M1 in 2011 with no information on imaging were considered as M1 when using DI. MI, multiple imputation; DI, deterministic imputation; PDI + MI, partially deterministic imputation + MI; SMI, standard MI; RMI, restricted MI; NPCR, National prostate cancer register of Sweden.
The estimated annual incidence of men with de novo metastatic prostate cancer categorized as low metastatic risk varied between methods (Supplementary Figure 2); SMI initially yielded a decrease followed by an increase over time, from 5 in 2000 to 11 per 100,000 men in 2019, and RMI yielded an increase over time, from 12 to 17 per 100,000 men, whereas DI and PDI + MI were stable around 2 per 100,000 men. The estimated annual incidence of men with de novo metastatic prostate cancer categorized as high metastatic risk was similar for all methods except DI (Supplementary Figure 3).
4.4 Survival
The adjusted 5-year overall survival curves for men with known M0 or M1, and for men with missing M stage imputed as M0 or M1 are shown in Figure 2. When applying the methods PDI + MI and SMI, the survival curves for men with imputed M stage closely matched those for men with known M stage when considering all men and men categorized as high metastatic risk. Among men categorized as low metastatic risk, the number of imputed M1 according to PDI + MI were few (n = 98), making any comparison of survival uncertain. The adjusted survival curves for men with known and imputed M1 categorized as low metastatic risk separated immediately when applying the SMI method, and the RMI method yielded survival curves that did not match particularly well in any of the strata. The results were similar for prostate cancer specific survival (Supplementary Figure 4) and in unadjusted analyses (Supplementary Figures 5 and 6).
Fig. 2Adjusted overall survival averaged over the multiple imputations, for all men and stratified by those with low metastatic risk and high metastatic risk. M stage was imputed for men with missing M stage, i.e., men that did not undergo bone imaging. The strata defined by M stage therefore vary across imputations and imputation methods. The survival estimates for known M stage were not based on a complete-case analysis because the weights were computed based on both observed and imputed data. Numbers at risk, reported as averages over the multiple imputations, were computed in the weighted population, and may therefore be different between the imputation methods for men with known M stage. Some men had missing data in PSA or GS and could not be categorized into low or high metastatic risk even after imputation using the restricted MI (that omitted PSA and GS). PSA, prostate-specific antigen; GS, Gleason score.
The estimated age-standardized incidence of de novo metastatic prostate cancer differed markedly between the methods used to handle missing data in metastatic status. Partial deterministic imputation + multiple imputation simultaneously yielded a small number of men with imputed M1 among men with low metastatic risk and a survival of imputed M stage that best resembled that of observed M stage.
5.2 Validity of different methods for imputation of M stage
Deterministic imputation likely underestimates the incidence of M1, which mostly depends on the changing use of imaging over calendar time among men older than 70 years with high metastatic risk. Randomized clinical trials have shown that radical radiotherapy with neoadjuvant and adjuvant ADT increase survival in men with locally advanced prostate cancer [
], which likely has led to a more comprehensive workup of men with high metastatic risk in more recent years. This likely explains both the increase of imaging in these men after 2008 and the U-shape of the incidence curve.
The validity of the MI methods relies on the plausibility of the missing at random (MAR) assumption [
], since such variables may explain systematic differences between those with observed and missing data. When such variables are not available or omitted, data can no longer be considered MAR and is instead missing not at random (MNAR) [
]. In this study, missing information on variables that predict the risk of metastases and the probability of undergoing imaging was considered the primary reason why data could be MNAR. MNAR can result in a large bias in estimates obtained after MI that operates under the MAR assumption.
Using subject matter knowledge is crucial when data are missing frequently and missingness may be MNAR. Based on recommendations in guidelines on the use of imaging, we hypothesized that men with baseline cancer characteristics indicating a low risk for metastatic disease and who did not undergo imaging were unlikely to have metastases. Substituting missing M stage with M0 for these men likely results in a negligible underestimation of M1 disease. We did not expect systematic differences between imaged and nonimaged men with high metastatic risk on the risk of metastases. This motivated the use of the PDI + MI method. The PDI + MI produced the most convincing imputations among the considered methods based on the low number of men with imputed M1 and low metastatic risk and on the similarity of the survival curves. However, the validity of estimated incidence based on this method depends on how well it approximates the truth, which is unknown, and we were unable to test the above assumptions. Therefore, the findings do not prove that the method is valid. Ideally, a validation study should be performed where a random selection of cases with missing M stage was subjected to a patient record review to try to determine M stage and/or the reason for missingness.
Restricted MI did not include survival time or cause of death in the imputation model and did not produce similar adjusted survival curves when comparing men with known and imputed M stage and was thus unable to adequately impute M stage, particularly among men with low metastatic risk. Consequently the annual incidence of metastatic prostate cancer was likely overestimated with this method.
5.3 Other studies
Other studies have reported an association between missing data in stage and comorbidity [
]. Missing data in prostate cancer stage in the English Cancer Registry were imputed using a combination of substitution (deterministic imputation) followed by MI [
]. The authors observed an increase from 6% to 8% in the proportion of known metastatic prostate cancer between 2010 and 2013, which could potentially be due to changes in use of imaging as suggested by the large decrease over time in the proportion of men missing cancer stage, from 83.1% to 32.5%.
In an Australian cohort the validity of MI for missing cancer stage at diagnosis was assessed by cross-linkage with data from health care records [
]. The authors concluded that MI may be an appropriate method to handle missing data on cancer stage in a cancer registry particularly when more clinical variables were available. However any differences in clinical practice (e.g., diagnostic routines and use of imaging, which was not reported) and data registration (e.g., only summary stage was available and not separate TNM stage) makes it difficult to assess whether their findings are applicable in our study.
5.4 Implications for data registration and coding
Our results indicate that it is instrumental to have access to data on use of imaging to determine which men had known M stage, or else one cannot assess the potential magnitude of the underestimation of incidence. Such data may not be available for example if the M classification applied does not include a category “Mx” that indicates that imaging was not performed and unknown M stage is coded as M0, and if there is no other variable indicating whether imaging was performed. It is also important to be able to distinguish between whether M stage was determined by imaging or if men were coded M0 if there is no obvious signs of metastasis [
Screening for prostate cancer decreases the risk of developing metastatic disease: findings from the European Randomized Study of Screening for Prostate Cancer (ERSPC).
]. An important strength was the availability of several auxiliary variables, most with negligible amount of missing data, which predict M stage and missingness in M stage. This increased the plausibility of the MAR assumption. By comparing results of the imputation methods for missing M stage we gained insights into how data availability and handling of missingness in M stage may affect incidence estimates.
Limitations of our study include the large proportion of missing data in M stage (66%) and missing data are predictors for imputing M stage (e.g., 75.6% for N stage) that may affect the performance of MI [
] than bone scintigraphy and changes in use of imaging modalities over time can cause bias. We were unable to assess this potential bias due to lack of such data. Moreover any temporal changes in assessment and definition of the auxiliary variables may also be a source of bias. For example, the Gleason classification has been modified during the study period [
The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma: definition of grading patterns and proposal for a new grading system.
Our study focused on analyses of register data for epidemiological, population-level studies, and the concepts and implications of this article apply to statistical aspects of missing data and are not intended to be adopted for clinical practice. However the results can help guide instruction for coding in cancer registries and clinical databases.
6. Conclusions
The amount of missing data in metastatic status is often high even in clinical cancer registers with otherwise comprehensive data and the estimated age-standardized incidence of de novo metastatic prostate cancer is sensitive to how missing data in metastatic status is handled. Substituting missing M stage with M0 underestimates the incidence. The most convincing results were obtained from imputations of missing M stage using DI of missing M stage to M0 in men with low baseline risk of metastases combined with MI of missing M stage and other variables in all other men. These findings are also relevant for other cancers, if tailored to the context of interest, since the incidence of metastatic cancer is an important proxy for long term cancer-specific mortality in many cancer studies with short follow-up.
Acknowledgments
This project was made possible by the continuous work of the National Prostate Cancer Register of Sweden (NPCR) steering group: David Robinson (register holder) Ingela Franck Lissbrant (chair), Johan Styrke (cochair), Johan Stranne, Jon Kindblom, Camilla Thellenberg, Andreas Josefsson, Ingrida Verbiene, Hampus Nugin, Stefan Carlsson, Anna Kristiansen, Mats Andén, Thomas Jiborn, Olof Ståhl, Olof Akre, Per Fransson, Eva Johansson, Magnus Törnblom, Fredrik Jäderling, Marie Hjälm Eriksson, Lotta Renström, Jonas Hugosson, Ola Bratt, Maria Nyberg, Fredrik Sandin, Camilla Byström, Mia Brus, Mats Lambe, Anna Hedström, Nina Hageman, Christofer Lagerros, Hans Joelsson, and Gert Malmberg.
When to perform bone scan in patients with newly diagnosed prostate cancer: external validation of the currently available guidelines and proposal of a novel risk stratification tool.
Screening for prostate cancer decreases the risk of developing metastatic disease: findings from the European Randomized Study of Screening for Prostate Cancer (ERSPC).
The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma: definition of grading patterns and proposal for a new grading system.
Conflicts of interest: The authors report no conflicts of interest.
Funding: This project was supported by the Swedish Cancer Society [19 00 30] and Region Uppsala and Uppsala University. Marcus Westerberg received financial support from the Center for Interdisciplinary Mathematics (CIM), Uppsala University. The sponsors had no involvement with the planning, execution, or completion of the study.
Disclaimer: Rolf Gedeborg is employed by the Medical Products Agency (MPA) in Sweden. The MPA is a Swedish Government Agency. The views expressed in this article may not represent the views of the MPA.
Author statement: M.W. contributed to conceptualization, methodology, software, formal analysis, investigation, project administration, visualization, validation, writing–original draft. K.B. contributed to conceptualization, writing–review and editing. R.G. contributed to conceptualization, methodology, writing–review and editing. S.I. contributed to conceptualization. writing–review and editing. L.H. contributed to conceptualization, writing–review and editing. H. G. contributed to conceptualization, methodology, project administration, data curation, writing–review and editing. Pär Stattin contributed to conceptualization, project administration, funding acquisition, supervision, resources, writing–review and editing.