Evidence of questionable research practices in clinical prediction models

Table 1 Examples of how a clinical prediction model can be hacked to get a better AUC value that is likely to be over-inflated as the model is over-fitted. Some of these approaches create multiple results from which the best result can be selected, often without disclosing the multiple results. Some hacking may be unintentional as researchers believe they are following standard practice. Some approaches can be acceptable when combined with appropriate validation, but the number of models fitted should always be disclosed and should be pre-defined in a protocol or pre-registration [1]

- Selectively choosing data sets (from those that are available to the researcher) to build and evaluate a model
- Collecting more data until a desirable AUC value is reached [25, 26]
- Fitting multiple, potentially hundreds, of models based on subsets of potential predictors [26]
- Trialling different cut-points when dichotomising continuous predictors until a “good” AUC is achieved [27]
- Including predictors that are proxies of the outcome or that work via reverse causality, for example, using blood tests taken after the outcome
- Changing the outcome variable, for example to a proxy of the original diagnostic outcome [26]
- Trialling alternative methods for imputing missing data [26]
- Removing observations that are difficult to fit [25, 26]
- Trialling different modelling approaches, e.g. logistic regression models and classification trees [28]
- Rounding up an AUC value to pass a threshold, for example reporting 0.79 as 0.8 [25]
- Choosing the “best” random seed for split sample validation or a model’s hyper-parameters [29]
- Not using internal validation, so the model performance is evaluated in the same data used to develop the model

ISSN: 1741-7015