Brier score summarizes model calibration and discrimination - Reply
Article Outline
Thank you for the opportunity to respond to the comments of Dr Rufibach. In the article by Lix et al. [1], the c-statistic (equal to the area under the receiver operator characteristic curve for a binary outcome variable) and Brier score were used to evaluate algorithms for classifying osteoporosis cases and noncases identified from a bone mineral density database. The algorithms were constructed using a number of variables defined from hospital, physician, and prescription administrative databases. The Brier score provided a measure of the agreement between the observed binary outcome (i.e., case vs. noncase) and the predicted probability of that outcome. It is a sum of both a calibration component and a discrimination (or refinement) component [2], [3], with lower scores indicating improved model accuracy.
Spiegelhalter's z-test [4] is used to evaluate the calibration component of the Brier score. This was not clearly described by Lix et al. [1]. The note to Table 2 should have indicated that values of the Brier score distinguished by a * were associated with a statistically significant value of Spiegelhalter's z-test (evaluated at α
=
0.05), indicating poor calibration. I appreciate Dr Rufibach's clarification of the interpretation of the study results.
References
- Using multiple data features improved the validity of osteoporosis case ascertainment from administrative data. J Clin Epidemiol. 2008;61:1250–1260
- . Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950;78:1–3
- . Separating the Brier score into calibration and refinement components: a graphical exposition. Am Stat. 1985;39:26–32
- . Probabilistic prediction in patient management and clinical trials. Stat Med. 1986;5:421–433
PII: S0895-4356(09)00362-X
doi:10.1016/j.jclinepi.2009.11.008
© 2010 Elsevier Inc. All rights reserved.
Refers to article:
- Use of Brier score to assess binary predictions , 02 March 2010
