Skip to main content

Table 21 Methods for variable transformations: Log-transform, standardization

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Log-transform

 Variables with nonnegative values are frequently encountered in practice and typically have a right-skewed distribution. A logarithmic transformation may be helpful to make the distribution of the data more symmetric. In principle, instead of X, the derived variable log(X) is used as input for prediction modelling [131]. An example in a high-dimensional context is gene expression microarray data, which typically enter in a prediction model after being log2 transformed (see, e.g., [43]). Other transformations than the logarithmic one are, of course, also possible, but rarer

Standardization

 Another variable transformation often performed in high-dimensional contexts is standardization. Here, the variable is centered (for each value of the variable the mean of the variable is subtracted) and scaled (each centered value is divided by the standard deviation of the variable). This procedure has advantages from an interpretation point of view. For example, the intercept of a linear model including age would represent a person of average age instead of a hypothetical person of age 0. Further, standardization is crucial for the correct implementation of many regularized methods (e.g., lasso and ridge regression, see section “PRED1.4: Statistical modelling”). Note that standardization can cause problems when applying a prediction model to a new dataset. In this case, one either has to use the correction factors calculated from the original dataset or re-compute them on the new dataset, which is problematic because then individual predictions depend on other observations that happened to be included in the new dataset. Standardization is not mutually exclusive with other transformations, e.g., the logarithmic transformation described above, thus it is often performed in addition (i.e., after the logarithmic transformation)