Collapsing categories |
When a categorical variable has substantial imbalance in its distribution across categories, especially when relatively few observations are assigned to a certain category, it can cause instability in analyses. Models incorporating categorical variables with substantial imbalance can be strongly influenced by them. To avoid the undue influence of a rare category on the analysis, it may be necessary to accept the information loss by collapsing the variable, i.e., merging the rare category with another category that is similar in terms of content but more frequent. |