Skip to main content

Table 23 Methods for statistical modelling with constraints on regression coefficients: Ridge regression, lasso regression, elastic net, boosting

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Ridge regression, lasso regression, and the elastic net

 Two of the most commonly used constrained regression methods are ridge regression and lasso. Interestingly, the problem of minimization of a loss function under particular constraints can be mathematically rewritten as the minimization of the same loss function with an additional penalty term. Consequently, ridge regression estimates the regression coefficients by minimizing the negative log-likelihood (in linear regression this corresponds to the sum of squared errors) plus a penalty term defined as the sum of the squared values of the coefficients. For lasso, the penalty term is instead the sum of absolute values of the coefficients. In both cases, the amount of penalty to be added is controlled by a tuning parameter, which must be chosen either by the user or as part of the algorithm (usually by cross-validation)

 A nice property of the lasso penalty is that it forces many regression coefficients to be 0, providing implicit variable selection (those predictor variables whose coefficients are estimated equal to 0 are removed from the model). However, the lasso has more difficulties in handling correlations among prediction variables. To try to take advantage of the strengths of both methods, a solution that combines both penalties has been proposed under the name of elastic net [141]. A further tuning parameter (in addition to the one that controls the strength of penalty) must be chosen, to define the balance between the two types of penalty. For extreme values of this parameter, namely 0 and 1, elastic net reduces to ridge regression and lasso, respectively

Boosting

 An alternative to adding constraints to solve the dimensionality problem for HDD is to pursue a stagewise approach. Starting from the simplest model (e.g., in regression, the null model), a single new predictor variable is added stepwise to the model, gradually improving it [142, 145]. The basic idea of boosting (combine several partial improvements to obtain a final good model) works particularly well when the improvements are small. Therefore, at each step, a regularized approach to the univariate problem is performed. For example, in a regression problem, rather than allowing only a single opportunity to add each predictor variable and produce its coefficient estimate, boosting allows a regression coefficient to be updated several times. At each step, the method selects the variable whose regression coefficient is to be updated, based on the minimization of the loss function

 Valuable properties already mentioned for lasso, such as shrinkage and intrinsic variable selection, are also achieved by boosting. Shrinkage results from the use of a loss function incorporating a penalty to constrain parameter estimates. The stagewise nature of the procedure potentially allows for stopping before all predictors have been added to the model, effectively setting the regression coefficients for the remaining predictor variables to zero. When to stop updating the model to avoid excessive complexity and, consequently, overfitting is a crucial decision for which several criteria have been proposed, see, e.g., Mayr et al. [146]