# Table 4 Model selection and discrimination in the derivation and validation cohorts

Colorectal cancer Colon cancer Rectal cancer
Selected predictors Both sexes Men Women Both sexes Men Women Both sexes Men Women
Age at recruitment, per 10 years
Waist circumference, per 10 cm
Height, per 10 cm
Daily alcohol consumption, high
Ever smoker, yes
Physically active, yes
Vegetables, per 100 g/day
Fruits, per 100 g/day
Dairy products, per 100 g/day
Red meat, per 50 g/day
Poultry, per 50 g/day
Processed meat, per 50 g/day
Fish, per 50 g/day
Sugar and confectionary, per 50 g/day
Soft drinks, per 100 g/day
Harrell’s C-index
Full model
Derivation cohort 0.710 0.700 0.702 0.718 0.708 0.718 0.705 0.705 0.677
Optimism corrected * 0.708 0.697 0.700 0.716 0.707 0.715 0.704 0.703 0.668
Validation cohort 0.715 0.707 0.700 0.708 0.727 0.700 0.730 0.689 0.693
Reduced model
Derivation cohort 0.710 0.699 0.700 0.717 0.705 0.717 0.703 0.700 0.668
Optimism corrected* 0.709 0.698 0.699 0.716 0.704 0.715 0.701 0.698 0.667
Validation cohort 0.714 0.708 0.699 0.708 0.727 0.698 0.728 0.687 0.696
1. *Harrell's C-index for the derivation cohort corrected for optimism by bootstrapping with 1000 replications. For each bootstrap sample a new model is fitted and the C-index calculated for the bootstrap sample and the original derivation cohort. The difference between these two C-indices is then averaged over all bootstrap replications and then subtracted from the original C-index