Skip to main content

Table 1 The bootstrap process to examine instability of model predictions in a chosen target population, as adapted from Riley and Collins [12]

From: Clinical prediction models and the multiverse of madness

Using the model development dataset of \(n\) participants from the chosen target population, we recommend the following process:

• Step 1: Use the developed model to make predictions (\({\widehat{p}}_{i})\) for each individual participant (\(i=1\) to \(n\)) in the development dataset

• Step 2: Generate a bootstrap sample with replacement, of size \(n\)

• Step 3: Develop a bootstrap prediction model in the bootstrap sample, replicating exactly (or as far as practically possible) the same model development approach and set of candidate predictors as used originally

• Step 4: Use the bootstrap model developed in step 3 to make predictions for each individual (\(i)\) in the original dataset. We refer to these predictions as \({\widehat{p}}_{bi}\), where \(b\) indicates which bootstrap sample the model was generated in (\(b\) = 1 to \(B\))

• Step 5: Repeat steps 2 to 4 a total of \((B-1\)) times, and we suggest \(B\) is at least 200

• Step 6: Store all the predictions from the \(B\) iterations of steps 2 to 5 together in a single dataset, containing for each individual a prediction (\({\widehat{p}}_{i})\) from the original model and \(B\) predictions (\({\widehat{p}}_{1i} , {\widehat{p}}_{2i} ,\dots , {\widehat{p}}_{Bi})\) from the bootstrap models

• Step 7: Summarise the instability in the predictions. In particular, quantify the mean absolute prediction ‘error’ (MAPE) for each individual, and summarise this across individuals, and display a prediction instability plot (scatter of the \(B\) predicted values for each individual against their original predicted value). Other instability plots (e.g. for classification, clinical utility) and measures may also be useful, as shown elsewhere [12].