Dataframes

In addition to vectors and factors, there is another important object type for us, the data frame. A data frame is simply a combination of multiple vectors (variables) of the same length in a matrix. In the conventional format (wide format), the variables are found in the columns and the respondents in the rows.

  • Columns: Vectors, factors (variables)
  • Rows: Cases (individual observation units, e.g., respondents)

Let’s illustrate this with the example of the dataset we will use during the course: Panem Social Survey (pss). This is a training dataset based on the European Social Survey, but with significantly fewer variables/cases (only 10 cases and 4 variables):

idno district gndr agea
10000 Distrikt 1 male 41
10001 Distrikt 1 male 65
10002 Distrikt 1 male 48
10003 Distrikt 1 female 49
10004 Distrikt 1 female 48
10005 Distrikt 1 female 64
10006 Distrikt 1 male 63
10007 Distrikt 1 female 70
10008 Distrikt 1 female 80
10009 Distrikt 1 male 57

In this example dataset, we have four variables: idno, district, gndr, and agea. These are self-explanatory: idno is the unique ID, district is the district of the respondent, gndr is the gender, and agea is the age. Often, variables are not intuitively understandable, so you may need to consult a codebook. Handling larger datasets will be covered in the next learning block.

Let’s go to the final challenge!