When conducting a two-sample t-test, you need to differentiate between paired and unpaired samples.
In this testing scenario, two different groups within a sample (e.g., based on gender) are compared. Both groups are not related or paired. The response of one case is not influenced by the response of other cases or connected to it.
We want to test how the contractual working hours (wkhtot
) differ between men and women (gndr
) in the sample.
How are the variables coded? Check the codebook for details:
To perform the test, two assumptions need to be checked:
Equality of variances (Levene’s Test)
Normal distribution of the metric variable (iv)
The second assumption only needs to be tested if \(n < 30\). In situations with \(n > 30\), the test provides asymptotically correct results.
To check for homogeneity of variances, you calculate the Levene test. For this purpose, you use the leveneTest()
function from the library car
:
install.packages("car")
library("car")
leveneTest(
pss$wkhtot,
pss$gndr,
center = "mean"
)
## Levene's Test for Homogeneity of Variance (center = "mean")
## Df F value Pr(>F)
## group 1 0.5405 0.4623
## 4998
How is the test interpreted?
The null hypothesis of the test (\(H_0\)) states that both groups in the metric variable have equal variances. A p-value below \(0.05\) requires rejecting the null hypothesis, indicating unequal variances. You must specify this property in the test calculation.
Now, you again use the t.test()
function to calculate the test. The two variables are not separated by a comma as an argument but are specified as a formula. The metric variable comes first, followed by the categorical variable (with only two groups!) as the second. These are separated by a ~
(tilde). You assume a difference of \(0\) by default (mu = 0
), and in the argument paired = FALSE
, you specify that it is unpaired samples. The result of the Levene test is specified in the last argument: var.equal = TRUE
, as homogeneity of variances is assumed.
t.test(
pss$wkhtot ~ pss$gndr,
mu = 0,
alternative = "two.sided",
paired = FALSE, # ungepaarte Stichproben!
var.equal = TRUE # Option des Levene-Tests!
)
##
## Two Sample t-test
##
## data: pss$wkhtot by pss$gndr
## t = 1.3509, df = 4998, p-value = 0.1768
## alternative hypothesis: true difference in means between group female and group male is not equal to 0
## 95 percent confidence interval:
## -0.1436357 0.7803096
## sample estimates:
## mean in group female mean in group male
## 34.46080 34.14246
You now see the following values:
\(t = 1.3509\) (t-value)
\(p \approx 0.1768\) (p-value)
\(CI\approx[-0.1436357, 0.7803096]\) (confidence interval)
Group female \(\approx 34.46080\)
Group male \(\approx 34.14246\)
On average, men have slightly less ($0.31834), but the difference is not statistically significant.
Now we want to perform this test with a variable that includes more than two values (i.e., groups). To calculate a t-test, you need to specify two groups. We now want to test the difference based on educational attainment (edu
). The codes can be found in the codebook. The variable has a total of \(5\) values. We simply choose two groups to compare.
You must again perform the test for equality of variances first:
# Test of homogeneity of variances
leveneTest(
pss$wkhtot,
pss$edu,
center = "mean"
)
## Levene's Test for Homogeneity of Variance (center = "mean")
## Df F value Pr(>F)
## group 4 0.4981 0.7372
## 4643
Now you can perform the t-test. Before conducting this test, you need to choose two groups from the new variable. You compare the lowest and highest education levels. Since we are only comparing specific groups, we cannot use the formula notation. Instead, you specify the metric variable twice, restricting the data to each group using []:
t.test(
pss$wkhtot[pss$edu == "ES-ISCED I"],
pss$wkhtot[pss$edu == "ES-ISCED V"],
mu = 0,
alternative = "two.sided",
paired = FALSE,
var.equal = TRUE
)
##
## Two Sample t-test
##
## data: pss$wkhtot[pss$edu == "ES-ISCED I"] and pss$wkhtot[pss$edu == "ES-ISCED V"]
## t = 9.723, df = 1078, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.492021 6.763452
## sample estimates:
## mean of x mean of y
## 36.19636 30.56863
How do you interpret the result? What is the difference?
It can be seen that on average, individuals with lower education (mean of x) work more hours than individuals with higher education (mean of y). The effect is significant, and the difference is (5.62773) hours.
Now a paired two-sample t-test should be conducted. Paired means that the values of one group are related to the values of the other group. This is the case, for example, when a respondent answers a question at two different time points, or each person from Group A can be matched with a person from Group B (mother <-> child, partner). There is a dataset pss2
, collected two years after the original dataset (with the same respondents), and we now want to test if the mean values differed significantly over time.
Checking the Assumptions
Variables are metric \(\checkmark\)
Difference follows a normal distribution (relevant for \(n \leq 30\)) (\(\checkmark\))
It’s very easy because now you use the t.test()
function again. You just need to change the paired
argument:
t.test(
pss$trstprl,
pss2$trstprl,
alternative = "two.sided",
paired = TRUE
)
##
## Paired t-test
##
## data: pss$trstprl and pss2$trstprl
## t = NaN, df = 4964, p-value = NA
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## NaN NaN
## sample estimates:
## mean difference
## 0
Result Interpretation: On average, the trust in parliament did not differ between the two surveys.
Now move on to test situations with more than two groups!