Skip to main content

Table 12 Methods for graphical displays: Multidimensional scaling, t-SNE, UMAP, neural networks

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Multidimensional scaling (MDS)

 Multidimensional scaling requires as input a distance matrix with elements corresponding to distances between all pairs of observations calculated in the original (high-dimensional) space, and the lower dimension space (often two-dimensional) to which the data should be projected is specified. A representation of the data points in the lower-dimensional space, called an embedding, is constructed such that the distances between pairs of observations are preserved as much as possible. Functions that quantify the level of agreement between pairwise distances before and after dimension reduction are called stress functions. MDS implements mathematical algorithms to minimize the specified stress function

 Classical Multidimensional Scaling was first introduced by Torgerson [63]. Mathematically, it uses an eigenvalue decomposition of a transformed distance matrix to find an embedding. Torgerson [63] set out the foundations for this work, but further developments of the technique associated with the name principal coordinates analysis are attributed to Gower [64]. While Classical Multidimensional Scaling uses eigenvector decomposition to embed the data, non-Metric Multidimensional Scaling (nMDS) [65] uses optimization methods

T-Distributed Stochastic Neighbor Embedding (t-SNE)

 Some newer approaches to derive lower-dimensional representations of data avoid the restriction of PCA, which requires the new coordinates to be linear transformations of the original. One popular approach is t-SNE [66], which is a variation of Stochastic Neighbor Embedding (SNE) [67]. It is the most commonly used technique in single-cell RNA-Seq analysis. t-SNE explicitly optimizes a loss function, by minimizing the Kullback–Leibler divergence between the distributions of pairwise differences between observations (subjects) in the original space and the low-dimensional space. PCA plots, which are typically based on the first two or three principal component scores, focus on preserving the distances between data points widely separated in high-dimensional space, whereas t-SNE aims to provide representations that preserve the distances between nearby data points. This means that t-SNE reduces the dimensionality of data mainly based on local properties of the data. t-SNE requires the specification of a tunable parameter known as “perplexity” which can be interpreted as a guess for the number of the effective neighbors (number of neighbors that are considered close). Figure 10 shows the result of t-SNE on a dataset with eight classes

Uniform manifold approximation and projection (UMAP)

 t-SNE has been shown to efficiently reveal local data structure and was widely used for identifying subgroups of populations in cytometry and transcriptomic data. However, it has some limitations. It does not preserve well the global structure of the data, i.e., relations between observations that are far apart are not captured well by the low-dimensional representation. A further drawback is large computation time for HDD, especially for very large sample size n. A newer approach called uniform manifold approximation and projection (UMAP) [68] overcomes some of these limitations by using a different probability distribution in high dimensions. In particular, construction of an initial neighborhood graph is more sophisticated, e.g., by incorporating weights that reflect uncertainty. In addition, UMAP directly uses the number of nearest neighbors instead of the perplexity as tuning parameter, thus making tuning more transparent. On real data, UMAP has been shown to preserve as much of the local and more of the global data structure than t-SNE, with more reproducible results and shorter run time [69]

Neural networks

 Neural networks provide another way to identify non-linear transformations to obtain lower-dimensional representations of HDD, which in many cases outperform simple linear transformations [70]. The concept is briefly described in section “PRED1.5: Algorithms” in the context of reducing the number of variables in preparation for development of prediction models or algorithms. Yet, research is ongoing to determine how best to develop low-dimensional representations and corresponding derived variables, and which of those derived variables might be most suitable depending on their subsequent use for statistical modelling or other purposes