2 Combining Datasets

Often, we have micro-level datasets (e.g., European Social Survey) that we want to combine with macro datasets (e.g., country-level economic data). For this purpose, dplyr offers a total of four functions, with left_join() being used in most cases.

Let’s consider the following example: We have found a second dataset during our research that lists the percentage of residents in a district affected by poverty (poverty) and the percentage of the population experiencing malnutrition (nutrition) in addition to our PSS dataset.

Table 1: Makrodaten pro Distrikt
district poverty nutrition
Distrikt 1 0.5 0
Distrikt 5 4.3 5
Distrikt 7 6.7 8.4
Distrikt 10 15.3 23.1
Distrikt 12 32.7 47.5

Suppose we now want to include in a multilevel model how these district factors affect satisfaction with democracy. To do this, the two variables maxsem should be added to the pss dataset. We will use left_join() for this purpose:

pssMerged <- pss %>%
  left_join(
    pssMacro, 
    by = "district"
  )

Also, you could use right_join() here:

pssMerged2 <- pssMacro %>%
  right_join(
    pss, 
    by = "district"
  ) 

For example, in multilevel models, we could calculate the effects of individual factors such as field of study and high school grades, as well as the effects of supervision ratio and seminar size.