Descriptive Statistics

Structure and Information of a Dataset

But first, let’s apply what we learned earlier and now load the dataset pss into the environment! In RStudio Cloud, the dataset is already in your data folder. If you are working locally with RStudio, you can find the dataset and the codebook here:

  • PSS_Codebook.pdf (94 KB)
  • pss.rds (63 KB)
  • Load the dataset into the environment!

    pss <- readRDS("./data/pss.rds")

    As a reminder: With the function head(), we can get a first look at the data (the first \(6\) cases):

    head(pss)

    Variables and Scale Level

    In the PSS dataset, there are several variables available. There is a codebook for the dataset, which is stored in RStudio Cloud or can be downloaded via the link above. Briefly consider the following variables, what scale level they exhibit, and which data type in R would be suitable for them.

    Variables in the dataset pss:

    • wkhtot

    • gndr

    • stfdem

    • trstprl

    The variables have the following scale levels:

    • wkhtot: metric

    • gndr: nominal

    • stfdem: (pseudo-)metric / ordinal

    • trstprl: (pseudo-)metric / ordinal

    
    str(pss$wkhtot)
    

    str(pss$gndr)

    str(pss$stfdem)

    str(pss$trstprl)

    All variables have the appropriate data type for their scale level.

    Length of a Vector

    To display the length of a vector, the length() function is used. This function provides the number of values in a vector. In the case of a variable, it indicates the number of observations in that variable.

    length(pss$edu)
    Can we infer the length of the other vectors, or do we need to display them again?

    Alternatively, you can also determine the length using the dim() function. However, the data frame object must be called in this case.

    dim(pss)
    ## [1] 5000   14

    The first dimension represents the rows (cases), and the second dimension represents the columns (variables).

    Let’s continue and display measures of central tendency!