In the table, the grade of each person from each exam can be read. Think briefly about which variables we want to generate from this sentence!
statclass
## name stat1 stat2 r spss
## 1 momo 12 5 6 9
## 2 kim 14 10 13 15
## 3 sascha 7 4 4 1
names
: momo, sascha, kimcourse
: statI, statII, r, spssgrade
: Value depending on the above two.
So there are two pieces of information in the columns stat1
, stat2
, r
, and spss
. Namely, what type of test it is (implicitly through variable name) and the grade. That is, values are given as variable names here, which violates the rules of a tidy dataset. However, in a tidy format, we need both pieces of information explicitly! Because the column names here are values (type of exam) and not just names.
To clean this up, we use pivot_longer()
. First, we specify which columns should be rearranged (in our case stat1
to spss
), then in which new variables the names or values should be stored. With names_to
, we name the new variable that distinguishes the test, and with values_to
, we name the variable that contains the grades.
statclassTidy <- statclass %>%
pivot_longer(
stat1:spss,
names_to = "course",
values_to = "grade"
) %>%
arrange(
name,
course
)
statclassTidy
## # A tibble: 12 × 3
## name course grade
## <chr> <chr> <dbl>
## 1 kim r 13
## 2 kim spss 15
## 3 kim stat1 14
## 4 kim stat2 10
## 5 momo r 6
## 6 momo spss 9
## 7 momo stat1 12
## 8 momo stat2 5
## 9 sascha r 4
## 10 sascha spss 1
## 11 sascha stat1 7
## 12 sascha stat2 4
Now we have a long format, which often makes data manipulation easier (e.g., with ggplot2
). But be careful: You can’t simply calculate the mean of grade
anymore, as this includes different courses. You have to set conditions when working in the long format.
To reverse this process, you can use the pivot_wider()
function:
statclassRe <- statclassTidy %>%
pivot_wider(
names_from = course,
values_from = grade,
)
statclassRe