From Data Collection to Dataset

In R, you usually work with already collected data. However, to better understand the data frame object, on this page, you will be able to follow the steps from data collection to dataset. This will help you in understanding the data frame object in R.

Later on, we will be using a fictional training dataset called Panem Social Survey, inspired by the European Social Survey. The advantage is that you can practice with a much smaller training dataset first before working with real (larger) datasets in person. You can already find the dataset and the codebook for PSS in the RStudio project. If you want to manually save it on your computer again, you can find it here:

You will also find a PDF named from-survey-to-data.pdf in the attachments. Download it and take a look!

In the file is an excerpt of four questions (variables from the dataset) that were collected in this fictional dataset. Tip: In the codebook, you will get more detailed information about the type of measurement of each variable.

Now take a moment to consider the scale level of each question (variable) and with what type of vector (numeric, integer, character, boolean) you would implement them in R.

As you have probably already understood, a dataset is nothing more than a collection of several variables from surveyed individuals that are processed together. Therefore, datasets are processed or read in tabular form. These data tables have two dimensions: Rows and Columns.

You will learn about the structure of a data table on the next page.