Study Unit 1 > Introduction to the R Language > Dataframes

Dataframes

In addition to vectors and factors, there is another important object type for us, the data frame. A data frame is simply a combination of multiple vectors (variables) of the same length in a matrix. In the conventional format (wide format), the variables are found in the columns and the respondents in the rows.

Columns: Vectors, factors (variables)
Rows: Cases (individual observation units, e.g., respondents)

Let’s illustrate this with the example of the dataset we will use during the course: Panem Social Survey (pss). This is a training dataset based on the European Social Survey, but with significantly fewer variables/cases (only 10 cases and 4 variables):

idno	district	gndr	agea
10000	Distrikt 1	male	41
10001	Distrikt 1	male	65
10002	Distrikt 1	male	48
10003	Distrikt 1	female	49
10004	Distrikt 1	female	48
10005	Distrikt 1	female	64
10006	Distrikt 1	male	63
10007	Distrikt 1	female	70
10008	Distrikt 1	female	80
10009	Distrikt 1	male	57

In this example dataset, we have four variables: idno, district, gndr, and agea. These are self-explanatory: idno is the unique ID, district is the district of the respondent, gndr is the gender, and agea is the age. Often, variables are not intuitively understandable, so you may need to consult a codebook. Handling larger datasets will be covered in the next learning block.

Let’s go to the final challenge!