Skip to main content

Table 8 Method for recoding: Collapsing categories

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Collapsing categories

 When a categorical variable has substantial imbalance in its distribution across categories, especially when relatively few observations are assigned to a certain category, it can cause instability in analyses. Models incorporating categorical variables with substantial imbalance can be strongly influenced by them. To avoid the undue influence of a rare category on the analysis, it may be necessary to accept the information loss by collapsing the variable, i.e., merging the rare category with another category that is similar in terms of content but more frequent.