Often, we have micro-level datasets (e.g., European Social Survey) that we want to combine with macro datasets (e.g., country-level economic data). For this purpose, dplyr
offers a total of four functions, with left_join()
being used in most cases.
Let’s consider the following example: We have found a second dataset during our research that lists the percentage of residents in a district affected by poverty (poverty
) and the percentage of the population experiencing malnutrition (nutrition
) in addition to our PSS dataset.
district | poverty | nutrition |
---|---|---|
Distrikt 1 | 0.5 | 0 |
Distrikt 5 | 4.3 | 5 |
Distrikt 7 | 6.7 | 8.4 |
Distrikt 10 | 15.3 | 23.1 |
Distrikt 12 | 32.7 | 47.5 |
Suppose we now want to include in a multilevel model how these district factors affect satisfaction with democracy. To do this, the two variables maxsem
should be added to the pss
dataset. We will use left_join()
for this purpose:
pssMerged <- pss %>%
left_join(
pssMacro,
by = "district"
)
Also, you could use right_join()
here:
pssMerged2 <- pssMacro %>%
right_join(
pss,
by = "district"
)
For example, in multilevel models, we could calculate the effects of individual factors such as field of study and high school grades, as well as the effects of supervision ratio and seminar size.