Calibration: the Achilles heel of predictive analytics

Table 1 Summary points on calibration

Why calibration matters	- Decisions are often based on risk, so predicted risks should be reliable
Why calibration matters	- Poor calibration may make a prediction model clinically useless or even harmful
Causes of poor calibration	- Statistical overfitting and measurement error
Causes of poor calibration	- Heterogeneity in populations in terms of patient characteristics, disease incidence or prevalence, patient management, and treatment policies
Assessment of calibration in practice	- Perfect calibration, where predicted risks are correct for every covariate pattern, is utopic; we should not aim for that
	- At model development, focus on nonlinear effects and interaction terms only if a sufficiently large sample size is available; low sample sizes require simpler modeling strategies or that no model is developed at all
	- Avoid the Hosmer–Lemeshow test to assess or prove calibration
	- At internal validation, focus on the calibration slope as a part of the assessment of statistical overfitting
	- At external validation, focus on the calibration curve, intercept and slope
	- Model updating should be considered in case of poor calibration; re-estimating the model entirely requires sufficient data

ISSN: 1741-7015