Skip to main content

Table 10 Method for construction of new variables: Discretizing continuous variables

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Discretizing continuous variables

 Discretization of a variable refers to the process of converting or partitioning a continuous variable into a nominal or ordinal categorical variable. Often, the variable is discretized into partitions of equal width (e.g., when constructing a histogram) or of equal frequencies (e.g., quartiles). Alternatively, the categorization may be based on historical context, for example if it is known that age above a certain threshold is a risk factor for a specific outcome. However, categorization introduces several problems and is often criticized in LDD [56, 57], especially for the extreme version with only two groups, called dichotomization. This simplification of the data structure often leads to a considerable loss of power, and the use of a data-driven optimal cutpoint for dichotomization of a variable leads to a serious bias in prediction models including the variable