Study unit 2 > Tidyverse > Tidyverse - tidyr > tidy up statclass 2

tidy up statclass 2

Where is the problem?

statclass2

##    test momo kim sascha  exam
## 1 stat1   12  13      4 exam1
## 2 stat1   NA  NA      8 exam2
## 3 stat2    5  10      5 exam1
## 4 stat2   NA  NA     NA exam2
## 5     r    6  13      3 exam1
## 6     r   NA  NA      9 exam2
## 7  spss    9   4      7 exam1
## 8  spss   NA   7     NA exam2

Variable names are column names!

And here is the solution: Here we also convert to the long format again!

statclass2Tidy <- statclass2 %>%
  pivot_longer(
    momo:sascha, 
    names_to = "names", 
    values_to = "grade"
  )

statclass2Tidy

## # A tibble: 24 × 4
##    test  exam  names  grade
##    <chr> <chr> <chr>  <dbl>
##  1 stat1 exam1 momo      12
##  2 stat1 exam1 kim       13
##  3 stat1 exam1 sascha     4
##  4 stat1 exam2 momo      NA
##  5 stat1 exam2 kim       NA
##  6 stat1 exam2 sascha     8
##  7 stat2 exam1 momo       5
##  8 stat2 exam1 kim       10
##  9 stat2 exam1 sascha     5
## 10 stat2 exam2 momo      NA
## # ℹ 14 more rows

Are there possibly more problems?

exam does not contain values, but names of variables, namely exam1 and exam2! Variables that indicate the grade in the exam, whose value is still in grade. Therefore, we now use pivot_wider() here to make the data tidy:

statclass2Tidy <- statclass2Tidy %>% 
  pivot_wider(
    names_from = exam, 
    values_from = grade
  ) %>% 
  relocate(names) %>% 
  arrange(
    names,
    test
  )

statclass2Tidy

Just for practice, you could also transform this back into the original dataset with pivot_wider():

statclass2re <- statclass2Tidy %>% 
  pivot_wider(
    names_from = test,
    values_from = c(
      exam1, 
      exam2
    )
  )