Categorical Variables in Regression Analysis: A Comparison of Dummy and Effect in coding
Abstract
The use of categorical variables in regression involves the application of coding methods.
The purpose of this paper is to describe how categorical independent variables can be
incorporated into regression by virtue of two coding methods: dummy and effect coding. The
paper discusses the uses, interpretations, and underlying assumptions of each method. In
general, overall results of the regression are unaffected by the methods used for coding the
categorical independent variables. In any of the methods, the analysis tests whether group
membership is related to the dependent variables. Both methods yield identical R
2 and F.
However, the interpretations of the intercept and regression coefficients depend on what
coding method has been applied and whether the groups have equal sample sizes.