Now, we will introduce some functions to display descriptive statistics.
What measures of central tendency and distribution are there?
Range, Interquartile Range, Quartiles
Variance, Standard Deviation
The functions for minimum, maximum, and range are:
# Minimum
min(pss$wkhtot)
## [1] 6
# Maximum
max(pss$wkhtot)
## [1] 65
# Spannweite / Range
range(pss$wkhtot)
## [1] 6 65
Now, try it for the variable stfdem
. What happens?
min(pss$stfdem)
## [1] NA
An error message will appear stating that the value cannot be calculated. This is because there are missing values (NA's
) in this variable. Some individuals have not provided a value, which is indicated in R as NA
. Therefore, a value cannot be calculated.
\(\rightarrow\) In order to calculate a value, missing values must be excluded.
min(
pss$stfdem,
na.rm = TRUE
)
## [1] 0
## na.rm: NA = missing values, rm = remove
For the median and mean, there are R base functions that can be used directly:
## Median
median(
pss$stfdem,
na.rm = TRUE
)
## [1] 5
## arithm. Mittelwert
mean(
pss$stfdem,
na.rm = TRUE
)
## [1] 4.657492
For the mode, there is no built-in function, but you can use the table()
function to display the data table of a variable and then determine the mode(s) from it:
## Mode (no built-in function)
table(pss$stfdem)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 226 268 436 618 754 850 631 522 338 179 83
max(table(pss$stfdem)) # not helpful for bimodal (etc.) distributions
## [1] 850
Sometimes you may want to output different ranges of a variable, such as income in deciles or quintiles. For this purpose, the quantile()
function is used, combined with the seq()
function.
The seq()
function generates a sequence of numbers, which we define using the three arguments from, to, and by. The from argument sets the starting value, the to argument sets the ending value, and the by argument sets the steps. In the example, we go from 0
to 1
in steps of 0.1
!
seq(
from = 0,
to = 1,
by = 0.1
)
seq(
by = 0.1,
to = 1,
from = 0
)
# wird die Standardreihenfolge (from-to-by) eingehalten, kann man die Argumentbeschriftung auslassen.
seq(
0,
1,
0.1
)
Now deciles (in 0.1
steps) can be output:
quantile(
pss$stfdem,
probs = seq(
0,
1,
0.1
),
na.rm = TRUE
)
What needs to be changed to output quintiles?
To achieve quintiles, we go in steps of (0.2), resulting in five values from (0) to (1).
quantile(
pss$stfdem,
probs = seq(
0,
1,
0.2
),
na.rm = TRUE
)
To output a series of descriptive values, you can also use the summary()
function:
summary(pss$stfdem)
Next, let’s move on to measures of dispersion!