Multivariate lin. Regression with dichotomous categorical variables

In social sciences, we often use variables with nominal or ordinal scale levels. We can also add these to the linear regression model as independent variables. We just need to understand the additional effect we are calculating. We need to create dummy variables that indicate the difference of one attribute of the variable in reference to another attribute (reference category). This is easy to implement in the lm() function.

We want to include the variable gndr in our model (female/male). What theoretical assumption can we make about the effect?

What do we equalize in the regression equation?

As we already know, a dichotomous variable does not have a linear relationship with a metric variable. Therefore, we need dummy variables. We calculate a model where the additional effect of one category compared to the other category is estimated. This effect is a constant effect!.

The variable gndr has the following categories:

  • female

  • male

Calculating the Model

Here, we only need to add the variable gndr according to the extended equation:

olsModel3 <- lm(
  stfdem ~ 1 + stfeco + trstlgl + gndr,   
  data = pss
)            

summary(olsModel3)
## 
## Call:
## lm(formula = stfdem ~ 1 + stfeco + trstlgl + gndr, data = pss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7490 -1.0846  0.0411  1.1642  5.7898 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.70975    0.09539   7.440 1.18e-13 ***
## stfeco       0.87435    0.01356  64.496  < 2e-16 ***
## trstlgl     -0.04137    0.01319  -3.136  0.00173 ** 
## gndrmale    -0.08020    0.04957  -1.618  0.10573    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.732 on 4889 degrees of freedom
##   (107 observations deleted due to missingness)
## Multiple R-squared:  0.4601,	Adjusted R-squared:  0.4598 
## F-statistic:  1389 on 3 and 4889 DF,  p-value: < 2.2e-16

What is the reference category?

What effect are we calculating with the variable gndrmale?

The reference category is female, meaning female respondents.

The additional effect is calculated for individuals who are male (compared to female individuals). Therefore, male individuals have a satisfaction level \(-0.08020\) lower than female individuals.

How do we interpret the model?

Write a few lines in the script!

The model explains \(45.98 %\) of the variance in the variable stfdem. Satisfaction with economic performance (stfeco) and trust in the legal system (trstlgl) have a significant effect on satisfaction with democracy (stfdem). The effect of stfeco is positive (\(\beta_1 = 0.87435\)), the effect of trust in the legal system (\(\beta_2 = -0.04137\)), and the effect for male individuals (\(\beta_3 = -0.08020\)) are both negative. Individuals with higher trust or who are male have slightly lower satisfaction.

If you compare the formula above with the task again, something should catch your attention! What needs to be changed in the formula to actually fit the regression model?

Why this is relevant, you will see on the next page! Now you have learned how to include dichotomous categorical variables in the regression model, but how does it work with polytomous categorical variables?