Regressions

In this example, the goal is to visually represent the variable stfdem in all steps. Finally, you want to calculate a regression model on stfdem that includes the effects of stfeco, district, gndr, trstlgl, trstprl, and agea.

First, you will now calculate the model before you can start with the graphical representations:

model1 <- lm(
  stfdem ~ 1 + stfeco + district + gndr + trstlgl + trstprl + agea,
  pss
)

summary(model1)
## 
## Call:
## lm(formula = stfdem ~ 1 + stfeco + district + gndr + trstlgl + 
##     trstprl + agea, data = pss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7191 -1.0880  0.0338  1.1554  5.7564 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.582230   0.179506   3.244 0.001189 ** 
## stfeco               0.878646   0.014100  62.314  < 2e-16 ***
## districtDistrikt 5   0.013212   0.082147   0.161 0.872229    
## districtDistrikt 7   0.067992   0.084835   0.801 0.422900    
## districtDistrikt 10  0.073204   0.089332   0.819 0.412567    
## districtDistrikt 12  0.175263   0.097254   1.802 0.071593 .  
## gndrmale            -0.089675   0.050657  -1.770 0.076754 .  
## trstlgl             -0.048787   0.013905  -3.509 0.000455 ***
## trstprl             -0.001430   0.012900  -0.111 0.911713    
## agea                 0.001942   0.002268   0.856 0.391970    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.734 on 4695 degrees of freedom
##   (295 observations deleted due to missingness)
## Multiple R-squared:  0.4593,	Adjusted R-squared:  0.4583 
## F-statistic: 443.2 on 9 and 4695 DF,  p-value: < 2.2e-16

In the model, you can see significant effects of stfeco, gndrmale, and trstlgl. We will now visualize these effects. But first, you will repeat the plot of all regression coefficients from Learning Block 4 using dwplot().

Try it yourself first and then check here!

Often, the goal is not only to present regression models in tables but also to graphically represent specific effects, such as the significant effects. Here we present a method: manually through custom predictions.

While there are libraries like ggraphExtra, they have limited plotting capabilities.

However, once we have a multivariate model, this is no longer directly possible. We need to hold the other effects constant to correctly represent the visualization. But it is still easy to implement once you understand what holding constant means.

We want to plot the effect of stfeco on stfdem. This effect should always be interpreted in conjunction with the other effects. To plot the effect, we hold the other effect(s) constant. For this, we use the sjPlot library and utilize the plot_model() function. You also need to load ggplot.

install.packages("sjPlot")
library("sjPlot")
library("tidyverse")

In the plot_model() function, you first call the object of the regression model (here model1), specify in the second argument that it is a marginal plot containing predicted values (type = pred), and then determine in the terms argument which effect to plot. The function automatically holds all metric variables constant at their mean and categorical variables at the reference category.

plot_model(
  model1, 
  type = "pred",
  terms = "stfeco"
)

So in the graph, you see the effect of stfeco on stfdem, while holding constant (mean) trstlgl, trstprl, and agea for female (reference level gndr) individuals from District 1 (reference level district).

If you want to add the effect for different groups of categorical variables, simply specify the additional variable in terms:

plot_model(
  model1, 
  type = "pred",
  terms = c(
    "stfeco",
    "gndr"
  )
)

Similarly, you could easily add the next categorical variable:

plot_model(
  model1, 
  type = "pred",
  terms = c(
    "stfeco",
    "gndr",
    "district"
  )
)

You can also limit the output to specific levels (values of a variable) by placing the levels in [..] brackets after the variable name (but within the "..."):

# levels von district
levels(pss$district)
## [1] "Distrikt 1"  "Distrikt 5"  "Distrikt 7"  "Distrikt 10" "Distrikt 12"
plot_model(
  model1, 
  type = "pred",
  terms = c(
    "stfeco",
    "gndr",
    "district[Distrikt 1, Distrikt 12]")
)

Modifications of a ggplot work here as well: We change the axes and adjust the colors!

plot_model(
  model1, 
  type = "pred",
  terms = c(
    "stfeco",
    "gndr",
    "district"
  )
) +
  scale_x_continuous(
    breaks = seq(
      0, 
      10, 
      1
    )
  ) +
  scale_y_continuous(
    breaks = seq(
      0, 
      10, 
      1
    )
  ) +
  scale_color_manual(values = beyonce_palette(72))

Now, represent the relationship between trstlgl and stfdem. Each district should have its own line, and the plots should be separated by gender. Try it out yourself before looking at the solution.

plot_model(
model1,
type = “pred”,
terms = c(
“trstlgl”,
“district”,
“gndr”
)
) +
scale_x_continuous(
breaks = seq(
0,
10,
1
)
) +
scale_y_continuous(
breaks = seq(
0,
10,
1
)
) +
scale_color_manual(values = beyonce_palette(18))

That’s it! In this chapter, you have practiced creating graphics for individual methods and now have mastered the first steps with ggplot!