Advertisement

Choice of imputation method for missing metastatic status affected estimates of metastatic prostate cancer incidence

Open AccessPublished:December 17, 2022DOI:https://doi.org/10.1016/j.jclinepi.2022.12.008

      Highlights

      • Choice of imputation method for missing M stage affects the estimated incidence.
      • Setting missing all M stage to M0 results in biased estimate of incidence.
      • Multiple imputation can be used to gain insights about this bias.

      Abstract

      Objectives

      To study how handling missing data on M stage in a clinical cancer register affects estimates of incidence of metastatic prostate cancer.

      Study Design and Setting

      Estimates of age-standardized incidence of metastatic prostate cancer were obtained by the use of data in a population-based clinical cancer register in Sweden and using four methods for imputation of missing M stage. Adjusted survival was used to compare men with known and imputed M stage.

      Results

      The proportion of men with missing M stage was high (66%) and varied according to the risk group and over calendar time. The estimated incidence of metastatic disease varied depending on imputation method, with all methods indicating a decreasing incidence over time. A combination of deterministic imputation (DI) and multiple imputation (MI) produced adjusted survival curves for men with imputed M stage that best resembled the survival for men with known M stage.

      Conclusions

      Plausible estimates of incidence of metastatic prostate cancer in clinical cancer registers can be obtained by the use of a combination of DI of missing M stage and MI.

      Keywords

      What is new?

        Key findings

      • The estimated incidence of metastatic prostate cancer (M1) varied depending on how missing data on M stage was handled. Simply substituting all missing M stage with M0 underestimated the incidence of M1. Deterministic imputation of missing M stage to M0 among men with low risk of metastases in combination with multiple imputation yielded similar survival comparing men with known and missing M-stage.

        What this adds to what is known?

      • Information on serum of prostate-specific antigen levels and Gleason score, in addition to tumor node metastasis stage is necessary to yield plausible estimates when imputing missing M stage.

        What is the implication and what should change now?

      • Missing M stage cannot by default be deterministically imputed to M0. Analyses of incidence trends of metastatic prostate cancer should be complemented with sensitivity analyses and information on how missing M stage was handled.

      1. Introduction

      The incidence of de novo metastatic cancer (i.e., metastatic cancer at diagnosis) is an early proxy for cancer-specific mortality when evaluating interventions such as screening and easier and earlier access to health care. Incidence of de novo metastatic cancer is unaffected by new treatments, and it does not require a long observation time. Missing data on tumor node metastasis (TNM) variables is common [
      • Gurney J.
      • Sarfati D.
      • Stanley J.
      • Dennett E.
      • Johnson C.
      • Koea J.
      • et al.
      Unstaged cancer in a population-based registry: prevalence, predictors and patient prognosis.
      ,
      • Parry M.G.
      • Sujenthiran A.
      • Cowling T.E.
      • Charman S.
      • Nossiter J.
      • Aggarwal A.
      • et al.
      Imputation of missing prostate cancer stage in English cancer registry data based on clinical assumptions.
      ] and temporal changes in use of imaging can influence the pattern of missingness in M stage. For example, efforts to discourage inappropriate use of bone imaging in men with low-risk prostate cancer in Sweden reduced the proportion of men with low-risk prostate cancer who underwent bone imaging from 45% in 1998 to 3% in 2009 [
      • Makarov D.V.
      • Loeb S.
      • Ulmert D.
      • Drevin L.
      • Lambe M.
      • Stattin P.
      Prostate cancer imaging trends after a nationwide effort to discourage inappropriate prostate cancer imaging.
      ]. Missing data may also vary over time due to revised coding principles in cancer staging systems. An example is the removal of the category “Mx” for unknown metastatic status in the seventh edition of the TNM classification, with the result that men who have not undergone bone imaging are now classified as M0 [
      • Sobin L.H.
      • Compton C.C.
      TNM seventh edition: what's new, what's changed: communication from the International Union against Cancer and the American Joint Committee on Cancer.
      ,
      • Sobin L.H.
      ]. Trends in the incidence of de novo metastatic cancer may be biased unless missing M stage is handled appropriately because the reasons for missing M stage vary over calendar time and across risk categories [
      • Mohan K.
      • Pearl J.
      Graphical models for processing missing data.
      ,
      • Hughes R.A.
      • Heron J.
      • Sterne J.A.C.
      • Tilling K.
      Accounting for missing data in statistical analyses: multiple imputation is not always the answer.
      ,
      • Hardt J.
      • Herke M.
      • Leonhart R.
      Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research.
      ].

      1.1 Aim of the study

      The aim of the study was to assess statistical methods for estimating the age-standardized incidence of de novo metastatic prostate cancer when M stage is missing for a large proportion of men. The methods used should account for missing data that vary over calendar time and are related to other measured and unmeasured clinical variables.

      2. Materials

      All men diagnosed with prostate cancer from 2000 to 2019 registered in the National Prostate Cancer Register (NPCR) of Sweden were included [
      RATTEN
      Interactive on line report from NPCR.
      ]. The NPCR includes data on diagnostic work-up, tumor characteristics, and primary treatment. Data linkages in the Prostate Cancer data Base Sweden (PCBaSe) were performed between the NPCR, the National Patient Register, the National Cancer Register, and the National Cause of Death Register by use of the unique Swedish personal identity number [
      • Van Hemelrijck M.
      • Wigertz A.
      • Sandin F.
      • Garmo H.
      • Hellström K.
      • Fransson P.
      • et al.
      Cohort profile: the National prostate cancer register of Sweden and prostate cancer data Base Sweden 2.0.
      ]. The following variables were extracted from PCBaSe: age at diagnosis, year of diagnosis, serum level of prostate-specific antigen (PSA), clinical TNM stage, Gleason score (GS) of the diagnostic biopsy cores or World Health Organization (WHO) grade in fine needle biopsies, mode of detection (lower urinary tract symptoms, other symptoms, and asymptomatic), primary treatment, Charlson Comorbidity Index (CCI), survival time, and status (cause of death [prostate cancer or other causes] or censoring). Follow-up ended at the time of death or at the end of follow-up (December 31, 2019). Primary treatment was categorized into radical treatment (radical prostatectomy or radiotherapy), androgen deprivation therapy (ADT) (gonadotropin-releasing hormone, antiandrogens [bicalutamide] or orchidectomy), deferred treatment (active surveillance or watchful waiting) and other or unknown treatment (other). The CCI was based on discharge diagnoses, excluding prostate cancer and metastases, from the National Patient Register up to 10 years prior to prostate cancer diagnosis. Data on all men alive each year between the ages of 40 and 100 years were obtained from Statistics Sweden (SCB) [
      Statistics Sweden.
      ].
      Men with prostate cancer were categorized according to the risk of metastatic disease at diagnosis:
      Low metastatic risk: PSA <20 ng/mL, T1-2, and GS ≤7 or WHO grade 1–2 if GS is missing,
      High metastatic risk: PSA ≥ 20 ng/mL, T3-4, GS >7, or WHO grade 3 if GS is missing,
      Unknown metastatic risk: if missing any of PSA, T stage, and simultaneously both of GS and WHO grade.
      The categorization was designed to closely match the current Swedish clinical guidelines for use of imaging in the diagnostic workup of men with prostate cancer [].

      3. Methods

      We estimated the age-standardized incidence of de novo metastatic prostate cancer according to the age distribution in Sweden 2000 by using direct standardization [
      • Fleiss J.L.
      • Levin B.
      • Paik M.C.
      Statistical methods for rates and proportions.
      ]. To obtain an annual estimate of the proportion of M1 among all men alive in each age strata in the presence of missing data on M stage we used four different methods based on deterministic imputation (DI) and multiple imputation (MI) using the R package mice [
      • Little R.J.
      • Rubin D.B.
      Statistical analysis with missing data.
      ,
      • Buuren S.
      • Groothuis-Oudshoorn C.
      MICE: multivariate imputation by chained equations in R.
      ] as described below. The number of MIs was set to 128 [
      • White I.R.
      • Royston P.
      • Wood A.M.
      Multiple imputation using chained equations: issues and guidance for practice.
      ]. The definition of M stage used prior to 2011 was recreated for the whole cohort; i.e., M stage was considered missing if the man had not undergone imaging to assess metastatic status. Adjusted survival curves stratified by M stage were used to compare known and imputed M stage among men with M0 and M1, respectively, and these were obtained by the method of weighting to account for potential differences in baseline characteristics [
      • Toh S.
      • García Rodríguez L.A.
      • Hernán M.A.
      Analyzing partially missing confounder information in comparative effectiveness and safety research of therapeutics.
      ]. See Supplementary Materials for further details on the methods, specification of the imputation models, weight diagnostics, and sensitivity analyses.

      3.1 Deterministic imputation

      M stage was substituted to M0 for all men with missing M stage. This corresponds to a situation where only positive imaging results are registered and imaged men with M0 cannot be differentiated from nonimaged men, as in the current Union for International Cancer Control classification [
      • Sobin L.H.
      ].

      3.2 Partial deterministic imputation + multiple imputation

      For men with low-risk prostate cancer [
      • Briganti A.
      • Passoni N.
      • Ferrari M.
      • Capitanio U.
      • Suardi N.
      • Gallina A.
      • et al.
      When to perform bone scan in patients with newly diagnosed prostate cancer: external validation of the currently available guidelines and proposal of a novel risk stratification tool.
      ] the National Swedish guidelines for prostate cancer recommend against imaging as the prevalence of M1 among these men is very low [
      • Makarov D.V.
      • Loeb S.
      • Ulmert D.
      • Drevin L.
      • Lambe M.
      • Stattin P.
      Prostate cancer imaging trends after a nationwide effort to discourage inappropriate prostate cancer imaging.
      ]. M stage was therefore first substituted to M0 for all men categorized as low metastatic risk with missing M stage, and then remaining missing data in M stage and all other variables (e.g., PSA and N stage) was imputed using MI including all variables listed in the Materials section.

      3.3 Standard MI

      All variables listed in the Material section were included and missing data were imputed using MI. This method corresponds to a standard implementation of MI without any prior deterministic imputation.

      3.4 Restricted MI

      Many registers contain a limited number of variables used in clinical practice, such as the National Cancer Registry in Sweden that only registers TNM and no other clinical variables or survival data. To simulate this scenario only TNM stage, age, and year of diagnosis were included, and missing data were imputed using MI. Survival data were included in a sensitivity analysis, see Supplementary Materials.

      4. Results

      4.1 Baseline characteristics

      There were 190,420 men diagnosed with prostate cancer between 2000 and 2019 in NPCR. Baseline characteristics by M stage are summarized in Table 1. Of which 126,102 men (66%) had missing M stage; 15,526 men (8%) were M1, constituting 24% of all imaged men. Men with missing M stage had similar characteristics as men with M0 with respect to age at diagnosis, CCI, and mode of detection. The PSA, T stage and GS, however, indicated more favorable disease characteristics in men with missing M stage. Thirty six percent of men with M0 and 3% of men with M1 were categorized as low metastatic risk. The corresponding proportion for men with missing M stage was 70%, and these were substituted to M0 by the DI method and the partial deterministic imputation + MI (PDI + MI) method prior to MI.
      Table 1Baseline characteristics of men diagnosed with prostate cancer in PCBaSe between 2000 and 2019 by M stage determined by imaging. Men that did not undergo bone imaging have missing M stage. Column-wise percentages are indicated with () and row-wise percentages are indicated with []
      AllM Stage M0M Stage M1Missing M stage
      n(%)n(%) [%]n(%) [%]n(%) [%]
      N190,420(100)48,792(100) [26]15,526(100) [8]126,102(100) [66]
      Age at diagnosis, yr
       <6023,851(13)5,329(11) [22]1,027(7) [4]17,495(14) [73]
       60–6970,888(37)18,480(38) [26]3,900(25) [6]48,508(38) [68]
       70–7436,729(19)11,200(23) [30]3,050(20) [8]22,479(18) [61]
       75–8028,945(15)7,973(16) [28]3,140(20) [11]17,832(14) [62]
       80+30,007(16)5,810(12) [19]4,409(28) [15]19,788(16) [66]
      Year of diagnosis
       2,000–2,00550,744(27)14,894(31) [29]4,955(32) [10]30,895(25) [61]
       2,006–2,01156,868(30)10,005(21) [18]3,527(23) [6]43,336(34) [76]
       2,012–2,01982,808(43)23,893(49) [29]7,044(45) [9]51,871(41) [63]
      Charlson Comorbidity Index
       0137,465(72)35,967(74) [26]9,705(63) [7]91,793(73) [67]
       125,091(13)6,498(13) [26]2,640(17) [11]15,953(13) [64]
       216,853(9)3,933(8) [23]1,769(11) [10]11,151(9) [66]
       3+11,011(6)2,394(5) [22]1,412(9) [13]7,205(6) [65]
      PSA (ng/mL)
      Median (Q1, Q3)10 (6-24)15 (8-30)138 (39-503)8 (5-14)
       0–994,545(50)16,362(34) [17]1,038(7) [1]77,145(61) [82]
       10–1937,144(20)12,975(27) [35]1,140(7) [3]23,029(18) [62]
       20–4925,953(14)12,019(25) [46]2,315(15) [9]11,619(9) [45]
       50–9910,975(6)4,140(8) [38]2,128(14) [19]4,707(4) [43]
       100–49911,288(6)2,554(5) [23]4,705(30) [42]4,029(3) [36]
       500+5,974(3)314(1) [5]4,028(26) [67]1,632(1) [27]
       Missing4,541(2)428(1) [9]172(1) [4]3,941(3) [87]
      T stage
       189,350(47)16,343(33) [18]1,261(8) [1]71,746(57) [80]
       257,496(30)19,043(39) [33]3,290(21) [6]35,163(28) [61]
       332,854(17)11,725(24) [36]7,497(48) [23]13,632(11) [41]
       45,986(3)879(2) [15]2,859(18) [48]2,248(2) [38]
       Missing4,734(2)802(2) [17]619(4) [13]3,313(3) [70]
      N stage
       039,849(21)21,544(44) [54]1,867(12) [5]16,438(13) [41]
       16,522(3)2,986(6) [46]2,496(16) [38]1,040(1) [16]
       Missing144,049(76)24,262(50) [17]11,163(72) [8]108,624(86) [75]
      Gleason sum or WHO grade
       GS 6/WHO grade 176,341(40)10,397(21) [14]780(5) [1]65,164(52) [85]
       GS 7/WHO grade 269,224(36)21,700(44) [31]3,930(25) [6]43,594(35) [63]
       GS 8-10/WHO grade 340,496(21)16,216(33) [40]9,670(62) [24]14,610(12) [36]
       Missing
      If GS is missing then WHO grade is reported if known.
      4,359(2)479(1) [11]1,146(7) [26]2,734(2) [63]
      Metastatic risk
       Low metastatic risk105,952(56)17,387(36) [16]536(3) [1]88,029(70) [83]
       High metastatic risk73,378(39)29,878(61) [41]13,455(87) [18]30,045(24) [41]
       Unknown metastatic risk11,090(6)1,527(3) [14]1,535(10) [14]8,028(6) [72]
      Mode of detection
       Health check-up76,891(40)20,186(41) [26]2,381(15) [3]54,324(43) [71]
       Lower urinary tract symptoms56,613(30)14,286(29) [25]4,404(28) [8]37,923(30) [67]
       Other symptoms50,268(26)12,754(26) [25]8,346(54) [17]29,168(23) [58]
       Missing6,648(3)1,566(3) [24]395(3) [6]4,687(4) [71]
      Primary treatment
       Radical treatment
      Radical treatment includes radical prostatectomy and radical radiotherapy.
      74,752(39)28,828(59) [39]364(2) [0]45,560(36) [61]
       Androgen deprivation therapy52,378(28)13,076(27) [25]14,392(93) [27]24,910(20) [48]
       Deferred treatment
      Deferred treatment includes active surveillance and watchful waiting.
      54,809(29)5,426(11) [10]261(2) [0]49,122(39) [90]
       Other8,481(4)1,462(3) [17]509(3) [6]6,510(5) [77]
      Follow-up and status
       Median follow-up (Q1, Q3)6 (3-10)6 (3-10)2 (1-4)6 (3-10)
       Censored119,272(63)32,068(66) [27]3,807(25) [3]83,397(66) [70]
       Death by prostate cancer29,358(15)6,708(14) [23]9,101(59) [31]13,549(11) [46]
       Death by other causes41,790(22)10,016(21) [24]2,618(17) [6]29,156(23) [70]
      PCBaSe, prostate cancer data base sweden; PSA, prostate-specific antigen; WHO, world health organization; GS, Gleason score.
      a If GS is missing then WHO grade is reported if known.
      b Radical treatment includes radical prostatectomy and radical radiotherapy.
      c Deferred treatment includes active surveillance and watchful waiting.
      The annual number of men diagnosed with prostate cancer increased during the study period, while the annual number of men categorized as high metastatic risk was stable in all age groups. Simultaneously, the proportion of imaged men (i.e., known M stage) decreased from 48% in 2000 to 23% in 2008. This was followed by an increase to 37% in 2019 which was most pronounced among men aged 70 years or above categorized as high metastatic risk (Supplementary Figure 1).

      4.2 Baseline characteristics after imputation

      The proportions of men with imputed M1 among men with missing M stage were 7%, 10%, and 16% when applying PDI + MI, standard MI (SMI), and restricted MI (RMI), respectively. Among men with imputed M1, the proportion categorized as low metastatic risk varied substantially (1–40%) depending on the imputation method used (Supplementary Table 1), compared with 4% among men with known M1 (Table 1). When using PDI + MI, men with imputed M1 were older, had higher CCI, fewer were detected through a health checkup, and most men were assigned to primary treatment by ADT compared to other methods for imputation. The tumor characteristics among men with imputed M0 were similar across methods and tended toward more favorable disease characteristics compared to men with known M0 (Supplementary Table 1).

      4.3 Incidence of metastatic prostate cancer

      The estimated age-standardized incidence of de novo metastatic prostate cancer varied markedly between the four applied methods (Figure 1). The estimated incidences were 43, 70, 74, and 91 per 100,000 men in 2000 for each method DI, PDI + MI, SMI, and RMI, respectively. Both the estimated incidences, as well as the difference in estimated incidences between methods, decreased with time and were 32, 40, 50, and 57 per 100,000 men for DI, PDI + MI, SMI, and RMI, respectively in 2019. However, the estimated incidence curve was u-shaped for DI with a minimum of 26 per 100,000 men in 2012.
      Figure thumbnail gr1
      Fig. 1Standardized incidence of metastatic prostate cancer (with respect to the age distribution in 2000) by statistical method. M stage was imputed for men with missing M stage, i.e., men that did not undergo bone imaging. Data on use of bone imaging were not registered in NPCR for men diagnosed in 2011. Men registered as M1 in 2011 with no information on imaging were considered as M1 when using DI. MI, multiple imputation; DI, deterministic imputation; PDI + MI, partially deterministic imputation + MI; SMI, standard MI; RMI, restricted MI; NPCR, National prostate cancer register of Sweden.
      The estimated annual incidence of men with de novo metastatic prostate cancer categorized as low metastatic risk varied between methods (Supplementary Figure 2); SMI initially yielded a decrease followed by an increase over time, from 5 in 2000 to 11 per 100,000 men in 2019, and RMI yielded an increase over time, from 12 to 17 per 100,000 men, whereas DI and PDI + MI were stable around 2 per 100,000 men. The estimated annual incidence of men with de novo metastatic prostate cancer categorized as high metastatic risk was similar for all methods except DI (Supplementary Figure 3).

      4.4 Survival

      The adjusted 5-year overall survival curves for men with known M0 or M1, and for men with missing M stage imputed as M0 or M1 are shown in Figure 2. When applying the methods PDI + MI and SMI, the survival curves for men with imputed M stage closely matched those for men with known M stage when considering all men and men categorized as high metastatic risk. Among men categorized as low metastatic risk, the number of imputed M1 according to PDI + MI were few (n = 98), making any comparison of survival uncertain. The adjusted survival curves for men with known and imputed M1 categorized as low metastatic risk separated immediately when applying the SMI method, and the RMI method yielded survival curves that did not match particularly well in any of the strata. The results were similar for prostate cancer specific survival (Supplementary Figure 4) and in unadjusted analyses (Supplementary Figures 5 and 6).
      Figure thumbnail gr2
      Fig. 2Adjusted overall survival averaged over the multiple imputations, for all men and stratified by those with low metastatic risk and high metastatic risk. M stage was imputed for men with missing M stage, i.e., men that did not undergo bone imaging. The strata defined by M stage therefore vary across imputations and imputation methods. The survival estimates for known M stage were not based on a complete-case analysis because the weights were computed based on both observed and imputed data. Numbers at risk, reported as averages over the multiple imputations, were computed in the weighted population, and may therefore be different between the imputation methods for men with known M stage. Some men had missing data in PSA or GS and could not be categorized into low or high metastatic risk even after imputation using the restricted MI (that omitted PSA and GS). PSA, prostate-specific antigen; GS, Gleason score.

      5. Discussion

      5.1 Summary of findings

      The estimated age-standardized incidence of de novo metastatic prostate cancer differed markedly between the methods used to handle missing data in metastatic status. Partial deterministic imputation + multiple imputation simultaneously yielded a small number of men with imputed M1 among men with low metastatic risk and a survival of imputed M stage that best resembled that of observed M stage.

      5.2 Validity of different methods for imputation of M stage

      Deterministic imputation likely underestimates the incidence of M1, which mostly depends on the changing use of imaging over calendar time among men older than 70 years with high metastatic risk. Randomized clinical trials have shown that radical radiotherapy with neoadjuvant and adjuvant ADT increase survival in men with locally advanced prostate cancer [
      • Widmark A.
      • Klepp O.
      • Solberg A.
      • Damber J.E.
      • Angelsen A.
      • Fransson P.
      • et al.
      Endocrine treatment, with or without radiotherapy, in locally advanced prostate cancer (SPCG-7/SFUO-3): an open randomised phase III trial.
      ,
      • Warde P.
      • Mason M.
      • Ding K.
      • Kirkbride P.
      • Brundage M.
      • Cowan R.
      • et al.
      Combined androgen deprivation therapy and radiation therapy for locally advanced prostate cancer: a randomised, phase 3 trial.
      ], which likely has led to a more comprehensive workup of men with high metastatic risk in more recent years. This likely explains both the increase of imaging in these men after 2008 and the U-shape of the incidence curve.
      The validity of the MI methods relies on the plausibility of the missing at random (MAR) assumption [
      • Little R.J.
      • Rubin D.B.
      Statistical analysis with missing data.
      ]. It is recommended to include as many auxiliary variables as possible in the analysis to increase the plausibility of MAR [
      • Sterne J.A.
      • White I.R.
      • Carlin J.B.
      • Spratt M.
      • Royston P.
      • Kenward M.G.
      • et al.
      Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.
      ,
      • Buuren S.V.
      Flexible imputation of missing data.
      ], since such variables may explain systematic differences between those with observed and missing data. When such variables are not available or omitted, data can no longer be considered MAR and is instead missing not at random (MNAR) [
      • Little R.J.
      • Rubin D.B.
      Statistical analysis with missing data.
      ]. In this study, missing information on variables that predict the risk of metastases and the probability of undergoing imaging was considered the primary reason why data could be MNAR. MNAR can result in a large bias in estimates obtained after MI that operates under the MAR assumption.
      Using subject matter knowledge is crucial when data are missing frequently and missingness may be MNAR. Based on recommendations in guidelines on the use of imaging, we hypothesized that men with baseline cancer characteristics indicating a low risk for metastatic disease and who did not undergo imaging were unlikely to have metastases. Substituting missing M stage with M0 for these men likely results in a negligible underestimation of M1 disease. We did not expect systematic differences between imaged and nonimaged men with high metastatic risk on the risk of metastases. This motivated the use of the PDI + MI method. The PDI + MI produced the most convincing imputations among the considered methods based on the low number of men with imputed M1 and low metastatic risk and on the similarity of the survival curves. However, the validity of estimated incidence based on this method depends on how well it approximates the truth, which is unknown, and we were unable to test the above assumptions. Therefore, the findings do not prove that the method is valid. Ideally, a validation study should be performed where a random selection of cases with missing M stage was subjected to a patient record review to try to determine M stage and/or the reason for missingness.
      Restricted MI did not include survival time or cause of death in the imputation model and did not produce similar adjusted survival curves when comparing men with known and imputed M stage and was thus unable to adequately impute M stage, particularly among men with low metastatic risk. Consequently the annual incidence of metastatic prostate cancer was likely overestimated with this method.

      5.3 Other studies

      Other studies have reported an association between missing data in stage and comorbidity [
      • Klassen A.C.
      • Curriero F.
      • Kulldorff M.
      • Alberg A.J.
      • Platz E.A.
      • Neloms S.T.
      Missing stage and grade in Maryland prostate cancer surveillance data, 1992-1997.
      ] and age [
      • Merrill R.M.
      • Sloan A.
      • Anderson A.E.
      • Ryker K.
      Unstaged cancer in the United States: a population-based study.
      ,
      • Luo Q.
      • Yu X.Q.
      • Cooke-Yarborough C.
      • Smith D.P.
      • O'Connell D.L.
      Characteristics of cases with unknown stage prostate cancer in a population-based cancer registry.
      ,
      • Elliott S.P.
      • Johnson D.P.
      • Jarosek S.L.
      • Konety B.R.
      • Adejoro O.O.
      • Virnig B.A.
      Bias due to missing SEER data in D'Amico risk stratification of prostate cancer.
      ]. Missing data in prostate cancer stage in the English Cancer Registry were imputed using a combination of substitution (deterministic imputation) followed by MI [
      • Parry M.G.
      • Sujenthiran A.
      • Cowling T.E.
      • Charman S.
      • Nossiter J.
      • Aggarwal A.
      • et al.
      Imputation of missing prostate cancer stage in English cancer registry data based on clinical assumptions.
      ]. The authors observed an increase from 6% to 8% in the proportion of known metastatic prostate cancer between 2010 and 2013, which could potentially be due to changes in use of imaging as suggested by the large decrease over time in the proportion of men missing cancer stage, from 83.1% to 32.5%.
      In an Australian cohort the validity of MI for missing cancer stage at diagnosis was assessed by cross-linkage with data from health care records [
      • Luo Q.
      • Egger S.
      • Yu X.Q.
      • Smith D.P.
      • O'Connell D.L.
      Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data.
      ]. The authors concluded that MI may be an appropriate method to handle missing data on cancer stage in a cancer registry particularly when more clinical variables were available. However any differences in clinical practice (e.g., diagnostic routines and use of imaging, which was not reported) and data registration (e.g., only summary stage was available and not separate TNM stage) makes it difficult to assess whether their findings are applicable in our study.

      5.4 Implications for data registration and coding

      Our results indicate that it is instrumental to have access to data on use of imaging to determine which men had known M stage, or else one cannot assess the potential magnitude of the underestimation of incidence. Such data may not be available for example if the M classification applied does not include a category “Mx” that indicates that imaging was not performed and unknown M stage is coded as M0, and if there is no other variable indicating whether imaging was performed. It is also important to be able to distinguish between whether M stage was determined by imaging or if men were coded M0 if there is no obvious signs of metastasis [
      • Sobin L.H.
      • Compton C.C.
      TNM seventh edition: what's new, what's changed: communication from the International Union against Cancer and the American Joint Committee on Cancer.
      ] and M1 if PSA>100 ng/mL when imaging results were not reported [
      • Schröder F.H.
      • Hugosson J.
      • Carlsson S.
      • Tammela T.
      • Määttänen L.
      • Auvinen A.
      • et al.
      Screening for prostate cancer decreases the risk of developing metastatic disease: findings from the European Randomized Study of Screening for Prostate Cancer (ERSPC).
      ,
      • Tomic K.
      • Westerberg M.
      • Robinson D.
      • Garmo H.
      • Stattin P.
      Proportion and characteristics of men with unknown risk category in the National Prostate Cancer Register of Sweden.
      ].

      5.5 Strengths and limitations

      Data quality in NPCR has been shown to be high [
      • Tomic K.
      • Sandin F.
      • Wigertz A.
      • Robinson D.
      • Lambe M.
      • Stattin P.
      Evaluation of data quality in the National prostate cancer register of Sweden.
      ]. An important strength was the availability of several auxiliary variables, most with negligible amount of missing data, which predict M stage and missingness in M stage. This increased the plausibility of the MAR assumption. By comparing results of the imputation methods for missing M stage we gained insights into how data availability and handling of missingness in M stage may affect incidence estimates.
      Limitations of our study include the large proportion of missing data in M stage (66%) and missing data are predictors for imputing M stage (e.g., 75.6% for N stage) that may affect the performance of MI [
      • Marshall A.
      • Altman D.G.
      • Holder R.L.
      Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.
      ]. Methods such as single-photon emission computerized tomography and positron emission tomography have higher sensitivity and specificity [
      • Tateishi U.
      • Morita S.
      • Taguri M.
      • Shizukuishi K.
      • Minamimoto R.
      • Kawaguchi M.
      • et al.
      A meta-analysis of (18)F-Fluoride positron emission tomography for assessment of metastatic bone tumor.
      ,
      • Palmedo H.
      • Marx C.
      • Ebert A.
      • Kreft B.
      • Ko Y.
      • Türler A.
      • et al.
      Whole-body SPECT/CT for bone scintigraphy: diagnostic value and effect on patient management in oncological patients.
      ] than bone scintigraphy and changes in use of imaging modalities over time can cause bias. We were unable to assess this potential bias due to lack of such data. Moreover any temporal changes in assessment and definition of the auxiliary variables may also be a source of bias. For example, the Gleason classification has been modified during the study period [
      • Orrason A.W.
      • Westerberg M.
      • Garmo H.
      • Lissbrant I.F.
      • Robinson D.
      • Stattin P.
      Changes in treatment and mortality in men with locally advanced prostate cancer between 2000 and 2016: a nationwide, population-based study in Sweden.
      ,
      • Westerberg M.
      • Franck Lissbrant I.
      • Damber J.E.
      • Robinson D.
      • Garmo H.
      • Stattin P.
      Temporal changes in survival in men with de novo metastatic prostate cancer: nationwide population-based study.
      ,
      • Cazzaniga W.
      • Garmo H.
      • Robinson D.
      • Holmberg L.
      • Bill-Axelson A.
      • Stattin P.
      Mortality after radical prostatectomy in a matched contemporary cohort in Sweden compared to the Scandinavian Prostate Cancer Group 4 (SPCG-4) study.
      ,
      • Epstein J.I.
      • Egevad L.
      • Amin M.B.
      • Delahunt B.
      • Srigley J.R.
      • Humphrey P.A.
      The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma: definition of grading patterns and proposal for a new grading system.
      ].
      Our study focused on analyses of register data for epidemiological, population-level studies, and the concepts and implications of this article apply to statistical aspects of missing data and are not intended to be adopted for clinical practice. However the results can help guide instruction for coding in cancer registries and clinical databases.

      6. Conclusions

      The amount of missing data in metastatic status is often high even in clinical cancer registers with otherwise comprehensive data and the estimated age-standardized incidence of de novo metastatic prostate cancer is sensitive to how missing data in metastatic status is handled. Substituting missing M stage with M0 underestimates the incidence. The most convincing results were obtained from imputations of missing M stage using DI of missing M stage to M0 in men with low baseline risk of metastases combined with MI of missing M stage and other variables in all other men. These findings are also relevant for other cancers, if tailored to the context of interest, since the incidence of metastatic cancer is an important proxy for long term cancer-specific mortality in many cancer studies with short follow-up.

      Acknowledgments

      This project was made possible by the continuous work of the National Prostate Cancer Register of Sweden (NPCR) steering group: David Robinson (register holder) Ingela Franck Lissbrant (chair), Johan Styrke (cochair), Johan Stranne, Jon Kindblom, Camilla Thellenberg, Andreas Josefsson, Ingrida Verbiene, Hampus Nugin, Stefan Carlsson, Anna Kristiansen, Mats Andén, Thomas Jiborn, Olof Ståhl, Olof Akre, Per Fransson, Eva Johansson, Magnus Törnblom, Fredrik Jäderling, Marie Hjälm Eriksson, Lotta Renström, Jonas Hugosson, Ola Bratt, Maria Nyberg, Fredrik Sandin, Camilla Byström, Mia Brus, Mats Lambe, Anna Hedström, Nina Hageman, Christofer Lagerros, Hans Joelsson, and Gert Malmberg.

      Supplementary data

      References

        • Gurney J.
        • Sarfati D.
        • Stanley J.
        • Dennett E.
        • Johnson C.
        • Koea J.
        • et al.
        Unstaged cancer in a population-based registry: prevalence, predictors and patient prognosis.
        Cancer Epidemiol. 2013; 37: 498-504
        • Parry M.G.
        • Sujenthiran A.
        • Cowling T.E.
        • Charman S.
        • Nossiter J.
        • Aggarwal A.
        • et al.
        Imputation of missing prostate cancer stage in English cancer registry data based on clinical assumptions.
        Cancer Epidemiol. 2019; 58: 44-51
        • Makarov D.V.
        • Loeb S.
        • Ulmert D.
        • Drevin L.
        • Lambe M.
        • Stattin P.
        Prostate cancer imaging trends after a nationwide effort to discourage inappropriate prostate cancer imaging.
        J Natl Cancer Inst. 2013; 105: 1306-1313
        • Sobin L.H.
        • Compton C.C.
        TNM seventh edition: what's new, what's changed: communication from the International Union against Cancer and the American Joint Committee on Cancer.
        Cancer. 2010; 116: 5336-5339
        • Sobin L.H.
        Sobin L.H. Gospodarowicz M.K. Wittekind Ch. TNM classification of malignant tumours/. 7th ed. 2009
        • Mohan K.
        • Pearl J.
        Graphical models for processing missing data.
        J Am Stat Assoc. 2021; 116: 1023-1037
        • Hughes R.A.
        • Heron J.
        • Sterne J.A.C.
        • Tilling K.
        Accounting for missing data in statistical analyses: multiple imputation is not always the answer.
        Int J Epidemiol. 2019; 48: 1294-1304
        • Hardt J.
        • Herke M.
        • Leonhart R.
        Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research.
        BMC Med Res Methodol. 2012; 12: 184
        • RATTEN
        Interactive on line report from NPCR.
        (Available at)
        https://statistik.incanet.se/npcr/
        Date: 2022
        Date accessed: February 28, 2022
        • Van Hemelrijck M.
        • Wigertz A.
        • Sandin F.
        • Garmo H.
        • Hellström K.
        • Fransson P.
        • et al.
        Cohort profile: the National prostate cancer register of Sweden and prostate cancer data Base Sweden 2.0.
        Int J Epidemiol. 2013; 42: 956-967
      1. Statistics Sweden.
        (Available at)
        https://www.statistikdatabasen.scb.se
        Date: 2021
        Date accessed: February 28, 2022
      2. Regionala cancercentrum i samverkan.
        (Available at)
        • Fleiss J.L.
        • Levin B.
        • Paik M.C.
        Statistical methods for rates and proportions.
        John Wiley & Sons Inc, Hoboken, NJ2003
        • Little R.J.
        • Rubin D.B.
        Statistical analysis with missing data.
        John Wiley & Sons, Hoboken, NJ2019: 793
        • Buuren S.
        • Groothuis-Oudshoorn C.
        MICE: multivariate imputation by chained equations in R.
        J Stat Softw. 2011; 45: 1-67
        • White I.R.
        • Royston P.
        • Wood A.M.
        Multiple imputation using chained equations: issues and guidance for practice.
        Stat Med. 2011; 30: 377-399
        • Toh S.
        • García Rodríguez L.A.
        • Hernán M.A.
        Analyzing partially missing confounder information in comparative effectiveness and safety research of therapeutics.
        Pharmacoepidemiol Drug Saf. 2012; 21: 13-20
        • Briganti A.
        • Passoni N.
        • Ferrari M.
        • Capitanio U.
        • Suardi N.
        • Gallina A.
        • et al.
        When to perform bone scan in patients with newly diagnosed prostate cancer: external validation of the currently available guidelines and proposal of a novel risk stratification tool.
        Eur Urol. 2010; 57: 551-558
        • Widmark A.
        • Klepp O.
        • Solberg A.
        • Damber J.E.
        • Angelsen A.
        • Fransson P.
        • et al.
        Endocrine treatment, with or without radiotherapy, in locally advanced prostate cancer (SPCG-7/SFUO-3): an open randomised phase III trial.
        Lancet. 2009; 373: 301-308
        • Warde P.
        • Mason M.
        • Ding K.
        • Kirkbride P.
        • Brundage M.
        • Cowan R.
        • et al.
        Combined androgen deprivation therapy and radiation therapy for locally advanced prostate cancer: a randomised, phase 3 trial.
        Lancet. 2011; 378: 2104-2111
        • Sterne J.A.
        • White I.R.
        • Carlin J.B.
        • Spratt M.
        • Royston P.
        • Kenward M.G.
        • et al.
        Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.
        Bmj. 2009; 338: b2393
        • Buuren S.V.
        Flexible imputation of missing data.
        Chapman & Hall/CRC Interdisciplinary Statistics, Boca Raton, FL2018
        • Klassen A.C.
        • Curriero F.
        • Kulldorff M.
        • Alberg A.J.
        • Platz E.A.
        • Neloms S.T.
        Missing stage and grade in Maryland prostate cancer surveillance data, 1992-1997.
        Am J Prev Med. 2006; 30: S77-S87
        • Merrill R.M.
        • Sloan A.
        • Anderson A.E.
        • Ryker K.
        Unstaged cancer in the United States: a population-based study.
        BMC Cancer. 2011; 11: 402
        • Luo Q.
        • Yu X.Q.
        • Cooke-Yarborough C.
        • Smith D.P.
        • O'Connell D.L.
        Characteristics of cases with unknown stage prostate cancer in a population-based cancer registry.
        Cancer Epidemiol. 2013; 37: 813-819
        • Elliott S.P.
        • Johnson D.P.
        • Jarosek S.L.
        • Konety B.R.
        • Adejoro O.O.
        • Virnig B.A.
        Bias due to missing SEER data in D'Amico risk stratification of prostate cancer.
        J Urol. 2012; 187: 2026-2031
        • Luo Q.
        • Egger S.
        • Yu X.Q.
        • Smith D.P.
        • O'Connell D.L.
        Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data.
        PLoS One. 2017; 12: e0180033
        • Schröder F.H.
        • Hugosson J.
        • Carlsson S.
        • Tammela T.
        • Määttänen L.
        • Auvinen A.
        • et al.
        Screening for prostate cancer decreases the risk of developing metastatic disease: findings from the European Randomized Study of Screening for Prostate Cancer (ERSPC).
        Eur Urol. 2012; 62: 745-752
        • Tomic K.
        • Westerberg M.
        • Robinson D.
        • Garmo H.
        • Stattin P.
        Proportion and characteristics of men with unknown risk category in the National Prostate Cancer Register of Sweden.
        Acta Oncologica. 2016; 55: 1461-1466
        • Tomic K.
        • Sandin F.
        • Wigertz A.
        • Robinson D.
        • Lambe M.
        • Stattin P.
        Evaluation of data quality in the National prostate cancer register of Sweden.
        Eur J Cancer. 2015; 51: 101-111
        • Marshall A.
        • Altman D.G.
        • Holder R.L.
        Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.
        BMC Med Res Methodol. 2010; 10: 1-10
        • Tateishi U.
        • Morita S.
        • Taguri M.
        • Shizukuishi K.
        • Minamimoto R.
        • Kawaguchi M.
        • et al.
        A meta-analysis of (18)F-Fluoride positron emission tomography for assessment of metastatic bone tumor.
        Ann Nucl Med. 2010; 24: 523-531
        • Palmedo H.
        • Marx C.
        • Ebert A.
        • Kreft B.
        • Ko Y.
        • Türler A.
        • et al.
        Whole-body SPECT/CT for bone scintigraphy: diagnostic value and effect on patient management in oncological patients.
        Eur J Nucl Med Mol Imaging. 2014; 41: 59-67
        • Orrason A.W.
        • Westerberg M.
        • Garmo H.
        • Lissbrant I.F.
        • Robinson D.
        • Stattin P.
        Changes in treatment and mortality in men with locally advanced prostate cancer between 2000 and 2016: a nationwide, population-based study in Sweden.
        BJU Int. 2020; 126: 142-151
        • Westerberg M.
        • Franck Lissbrant I.
        • Damber J.E.
        • Robinson D.
        • Garmo H.
        • Stattin P.
        Temporal changes in survival in men with de novo metastatic prostate cancer: nationwide population-based study.
        Acta Oncologica. 2020; 59: 106-111
        • Cazzaniga W.
        • Garmo H.
        • Robinson D.
        • Holmberg L.
        • Bill-Axelson A.
        • Stattin P.
        Mortality after radical prostatectomy in a matched contemporary cohort in Sweden compared to the Scandinavian Prostate Cancer Group 4 (SPCG-4) study.
        BJU Int. 2019; 123: 421-428
        • Epstein J.I.
        • Egevad L.
        • Amin M.B.
        • Delahunt B.
        • Srigley J.R.
        • Humphrey P.A.
        The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma: definition of grading patterns and proposal for a new grading system.
        Am J Surg Pathol. 2016; 40: 244-252