Study unit 2 > Tidyverse > Tidyverse - tidyr > Tidy Up Statclass

Tidy Up Statclass

In the table, the grade of each person from each exam can be read. Think briefly about which variables we want to generate from this sentence!

statclass

##     name stat1 stat2  r spss
## 1   momo    12     5  6    9
## 2    kim    14    10 13   15
## 3 sascha     7     4  4    1

names: momo, sascha, kim

course: statI, statII, r, spss

grade: Value depending on the above two.

So there are two pieces of information in the columns stat1, stat2, r, and spss. Namely, what type of test it is (implicitly through variable name) and the grade. That is, values are given as variable names here, which violates the rules of a tidy dataset. However, in a tidy format, we need both pieces of information explicitly! Because the column names here are values (type of exam) and not just names.

To clean this up, we use pivot_longer(). First, we specify which columns should be rearranged (in our case stat1 to spss), then in which new variables the names or values should be stored. With names_to, we name the new variable that distinguishes the test, and with values_to, we name the variable that contains the grades.

statclassTidy <- statclass %>% 
  pivot_longer(
    stat1:spss, 
    names_to = "course", 
    values_to = "grade"
  ) %>% 
  arrange(
    name,
    course
  )

statclassTidy

## # A tibble: 12 × 3
##    name   course grade
##    <chr>  <chr>  <dbl>
##  1 kim    r         13
##  2 kim    spss      15
##  3 kim    stat1     14
##  4 kim    stat2     10
##  5 momo   r          6
##  6 momo   spss       9
##  7 momo   stat1     12
##  8 momo   stat2      5
##  9 sascha r          4
## 10 sascha spss       1
## 11 sascha stat1      7
## 12 sascha stat2      4

Now we have a long format, which often makes data manipulation easier (e.g., with ggplot2). But be careful: You can’t simply calculate the mean of grade anymore, as this includes different courses. You have to set conditions when working in the long format.

To reverse this process, you can use the pivot_wider() function:

statclassRe <- statclassTidy %>% 
  pivot_wider(
    names_from = course, 
    values_from = grade, 
  )

statclassRe