From: Evidence of questionable research practices in clinical prediction models
- Selectively choosing data sets (from those that are available to the researcher) to build and evaluate a model |
- Collecting more data until a desirable AUC value is reached [25, 26] |
- Fitting multiple, potentially hundreds, of models based on subsets of potential predictors [26] |
- Trialling different cut-points when dichotomising continuous predictors until a “good” AUC is achieved [27] |
- Including predictors that are proxies of the outcome or that work via reverse causality, for example, using blood tests taken after the outcome |
- Changing the outcome variable, for example to a proxy of the original diagnostic outcome [26] |
- Trialling alternative methods for imputing missing data [26] |
- Trialling different modelling approaches, e.g. logistic regression models and classification trees [28] |
- Rounding up an AUC value to pass a threshold, for example reporting 0.79 as 0.8 [25] |
- Choosing the “best” random seed for split sample validation or a model’s hyper-parameters [29] |
- Not using internal validation, so the model performance is evaluated in the same data used to develop the model |