Merging Datasets (Adding New Variables)

If you want to add additional variables to a dataset, you can also use full_join(). For example, let’s say we have another dataset from Panem that includes a newly introduced Social Score per person. Now, we want to add this data to the PSS dataset. This data is stored separately in the sp dataset, and both datasets use the same ID variable that matches the ID variable in the pss dataset. We will now add the social points to the pss dataset using full_join(). Let’s first take a look at the two datasets:

head(sp)
##   socialpoints    id
## 1     88.00907 10000
## 2     90.38817 10001
## 3     88.43383 10002
## 4     92.27890 10003
## 5     93.31521 10004
## 6     89.57225 10005
head(pss)
##    idno   district   gndr agea         edu wkhtot     income stfdem stfeco
## 1 10000 Distrikt 1   male   41 ES-ISCED IV     34 7th decile      7      6
## 2 10001 Distrikt 1   male   65 ES-ISCED II     20 6th decile      8      7
## 3 10002 Distrikt 1   male   48 ES-ISCED IV     27 7th decile      6      6
## 4 10003 Distrikt 1 female   49  ES-ISCED V     30 6th decile      5      4
## 5 10004 Distrikt 1 female   48 ES-ISCED IV     29 5th decile      4      5
## 6 10005 Distrikt 1 female   64  ES-ISCED V     30 6th decile      6      6
##   trstprl trstprt trstplt trstlgl lrscale
## 1       3       5       4       6       4
## 2       5       5       5       4       3
## 3       4       4       6       5       6
## 4       2       7       4       3       6
## 5       6       6       6       6       2
## 6       1       3       2       4       7

Although both datasets have an ID variable, the column names are different. Similar to what we did above, we can specify this in the by argument. This time, let’s quickly rename the column in one of the datasets before proceeding. We can simply use rename() for this task. The logic in the function is new Name = old Name. {#examples}

sp <- sp %>% 
  rename(idno = id)

head(sp)
##   socialpoints  idno
## 1     88.00907 10000
## 2     90.38817 10001
## 3     88.43383 10002
## 4     92.27890 10003
## 5     93.31521 10004
## 6     89.57225 10005

Now the column names are the same, and we can merge the datasets.

pss <- pss %>% 
  full_join(
    sp, 
    by = "idno"
  )

head(pss)
##    idno   district   gndr agea         edu wkhtot     income stfdem stfeco
## 1 10000 Distrikt 1   male   41 ES-ISCED IV     34 7th decile      7      6
## 2 10001 Distrikt 1   male   65 ES-ISCED II     20 6th decile      8      7
## 3 10002 Distrikt 1   male   48 ES-ISCED IV     27 7th decile      6      6
## 4 10003 Distrikt 1 female   49  ES-ISCED V     30 6th decile      5      4
## 5 10004 Distrikt 1 female   48 ES-ISCED IV     29 5th decile      4      5
## 6 10005 Distrikt 1 female   64  ES-ISCED V     30 6th decile      6      6
##   trstprl trstprt trstplt trstlgl lrscale socialpoints
## 1       3       5       4       6       4     88.00907
## 2       5       5       5       4       3     90.38817
## 3       4       4       6       5       6     88.43383
## 4       2       7       4       3       6     92.27890
## 5       6       6       6       6       2     93.31521
## 6       1       3       2       4       7     89.57225

And you’ve mastered that too! Now let’s move on to the next package in the tidyverse: tidyr!