The Structure of Datasets with tidyr

It is important for data processing that datasets are tidy so that functions in R can run smoothly with the data. This means that the data must be in a specific format for the functions in R to work well with the data (less troubleshooting).

Recap: What is a Dataset?

A dataset is always a collection of values, whether numerical or a string. These values are always organized in two ways: Each value belongs to a variable and an observation. A variable includes all values that have been measured for it (all observations on that variable). An observation includes all values that have been measured for that observation (all variable values of that unit).

For data to be effectively used with functions in R, they must be in a tidy format (also known as long format). A dataset is considered tidy when …

  • … each variable is a column,
  • … each observation is a row,
  • … and each observational unit forms a table.