Before you start calculating measures of association, you will first create tables. This is sometimes relevant for graphical representation. Often, tables displaying frequencies of variables are chosen for representation. For a simple (frequency) table, we call the table()
function:
table(pss$stfdem)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 226 268 436 618 754 850 631 522 338 179 83
The first row contains the code values, and the second row contains the frequencies. Listed here are the valid cases, i.e., each value that is not set to NA.
How many valid cases do we have? How many NA’s?
To test this, we use the sum()
and length()
functions:
# gültige Fälle aus der Tabelle
sum(
table(
pss$stfdem
)
)
## [1] 4905
# Summe der NA's
sum(
is.na(
pss$stfdem
)
)
## [1] 95
# Gesamtlänge: Gültige Fälle + NA's
length(pss$stfdem)
## [1] 5000
Alternatively, you can expand the table()
function with the argument useNA = "ifany"
:
table(
pss$stfdem,
useNA = "ifany"
)
##
## 0 1 2 3 4 5 6 7 8 9 10 <NA>
## 226 268 436 618 754 850 631 522 338 179 83 95
summarytools
To obtain a structured output, you can use the library summarytools
. You will get a view similar to SPSS. The library must be installed or loaded first:
install.packages("summarytools")
library("summarytools")
Then the freq()
function is used:
freq(pss$stfdem)
## Frequencies
## pss$stfdem
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 0 226 4.61 4.61 4.52 4.52
## 1 268 5.46 10.07 5.36 9.88
## 2 436 8.89 18.96 8.72 18.60
## 3 618 12.60 31.56 12.36 30.96
## 4 754 15.37 46.93 15.08 46.04
## 5 850 17.33 64.26 17.00 63.04
## 6 631 12.86 77.13 12.62 75.66
## 7 522 10.64 87.77 10.44 86.10
## 8 338 6.89 94.66 6.76 92.86
## 9 179 3.65 98.31 3.58 96.44
## 10 83 1.69 100.00 1.66 98.10
## <NA> 95 1.90 100.00
## Total 5000 100.00 100.00 100.00 100.00
The first column contains the code values. The second column shows the frequencies, the third column shows the percentages (of valid cases), and the fourth column shows the cumulative percentages (of valid cases). Columns \(5\) and \(6\) show the percentages and cumulative percentages of all cases (including NA's
).
In the next step, you will learn about cross-tabulations!