In Statistics I, you have already learned about z-standardization to compare distributions of variables more effectively. Z-standardization recodes a variable to the unit standard deviation, making variables that are z-standardized comparable.
In R, non-standardized variables also yield non-standardized regression coefficients. You have seen this in the interpretation of regression results: you always evaluated in the unit of the variables. The disadvantage of non-standardized variables is that the effects are not comparable in strength. This can only be achieved with standardized variables.
In some (often more complex) models, we want to evaluate the strength of the individual independent variables. Since the variables do not have the same unit, as explained above, this is not easily achievable. However, we can standardize the variables so that all variables have the same unit (standard deviations). For this, it is best to use the scale()
function from the tidyverse (dplyr
). In more complex regression models or advanced models (such as Multi-Level Models), variables are usually standardized before calculating the model.
Let’s take this step now for the variables in olsModel2
:
pss <- pss %>%
mutate(
stfdemZ = scale(stfdem),
stfecoZ = scale(stfeco),
trstlglZ = scale(trstlgl)
)
Next, we calculate the model again with the new variables:
olsModel2Z <- lm(
stfdemZ ~ 1 + stfecoZ + trstlglZ,
data = pss
)
How do we interpret the result?
summary(olsModel2Z)
##
## Call:
## lm(formula = stfdemZ ~ 1 + stfecoZ + trstlglZ, data = pss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.42296 -0.46136 0.01681 0.49498 2.47445
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.004028 0.010511 0.383 0.70155
## stfecoZ 0.697244 0.010815 64.468 < 2e-16 ***
## trstlglZ -0.033592 0.010517 -3.194 0.00141 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7352 on 4890 degrees of freedom
## (107 observations deleted due to missingness)
## Multiple R-squared: 0.4598, Adjusted R-squared: 0.4596
## F-statistic: 2081 on 2 and 4890 DF, p-value: < 2.2e-16
Result: With each increase of one standard deviation in stfeco
, stfdem
increases by \(0.697244\) standard deviations. As can be seen, the interpretation is somewhat cumbersome. But now the individual effects between metric variables can be compared. It becomes apparent that the effect of stfeco
is stronger than that of trstlgl
(\(0.697244 > |-0.033592|\)).