Skip to main content

Table 1 Overview of the structure of the paper, as a list of the sections with corresponding analytical goals and common approaches

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges


Analytical goals

Common approaches



Initial data analysis and preprocessing



Identify inconsistent, suspicious or unexpected values

Visual inspection of univariate and multivariate distributions

Histograms, boxplots, scatterplots, correlograms, heatmaps


Describe distributions of variables, and identify missing values and systematic effects due to data acquisition

Descriptive statistics, tabulation, analysis of control values, graphical displays

Measures for location and scale, bivariate measures, RLE plots, MA plots, calibration curve, PCA, Biplot


Preprocess the data

Normalization, batch correction

Background correction, baseline correction, centering and scaling, quantile normalization, ComBat, SVA


Simplify data and refine/update analysis plan if required

Recoding, variable filtering and exclusion of uninformative variables, construction of new variables, removal of variables or observations due to missing values, imputation

Collapsing categories, variable filtering, discretizing continuous variables, multiple imputation


Exploratory data analysis



Identify interesting data characteristics

Graphical displays, descriptive univariate and multivariate statistics

PCA, Biplot, multidimensional scaling, t-SNE, UMAP, neural networks


Gain insight into the data structure

Cluster analysis, prototypical samples

Hierarchical clustering, k-means, PAM, scree plot, silhouette values


Identification of informative variables and multiple testing



Identify variables informative for an outcome

Test statistics, modelling approaches

t-test, permutation test, limma, edgeR, DESeq2


Perform multiple testing

Multiple tests, control for false discoveries

Bonferroni correction, Holm’s procedure, multivariate permutation tests, Benjamini-Hochberg (BH), q-values


Identify informative groups of variables

Tests for groups of variables

Gene set enrichment analysis, over-representation analysis, global test, topGO





Construct prediction models

Variable transformations, variable selection, dimension reduction, statistical modelling, algorithms, integrating multiple sources of information

Log-transform, standardization, superPC, ridge regression, lasso regression, elastic net, boosting, SVM, trees, random forest, neural networks, deep learning


Assess performance and validate prediction models

Choice of performance measures, internal and external validation, identification of influential points

MSE, MAE, ROC curves, AUC, misclassification rate, Brier score, calibration plots, deviance, subsampling, cross-validation, bootstrap, use of external datasets