With measures of association, you can test whether and to what extent two variables are related. In your statistics lecture, you learned about \(\chi^2\) as a first measure to test for an association. Another measure you are familiar with for the strength of an association is Cramer’s V.
Cross-tabulations can display relationships between two variables. This relationship can also be statistically tested using the \(\chi^2\) independence test.
The null hypothesis of the \(\chi^2\) independence test is:
We can perform the test using the chisq.test()
function and directly use the cross-tabulation object:
chi1 <- chisq.test(mytable)
chi1
##
## Pearson's Chi-squared test
##
## data: mytable
## X-squared = 14.123, df = 10, p-value = 0.1674
Alternatively, you can also specify the two variables:
chi2 <- chisq.test(
pss$stfdem,
pss$gndr
)
chi2
The calculation of the \(\chi^2\) independence test requires two assumptions:
\(\Rightarrow\) Cross table! (\(\checkmark\))
The function chisq.test()
stores various values. Therefore, an associated object has the data type list. This data type is new to you. It is another data type that plays a significant role in R. In the list data type, multiple pieces of information can be stored, similar to a list on paper. You can use str()
to display what has been stored in the list:
str(chi1)
## List of 9
## $ statistic: Named num 14.1
## ..- attr(*, "names")= chr "X-squared"
## $ parameter: Named int 10
## ..- attr(*, "names")= chr "df"
## $ p.value : num 0.167
## $ method : chr "Pearson's Chi-squared test"
## $ data.name: chr "mytable"
## $ observed : 'table' int [1:11, 1:2] 117 133 211 287 377 466 311 251 164 82 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "0" "1" "2" "3" ...
## .. ..$ : chr [1:2] "female" "male"
## $ expected : num [1:11, 1:2] 112 133 217 307 375 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "0" "1" "2" "3" ...
## .. ..$ : chr [1:2] "female" "male"
## $ residuals: 'table' num [1:11, 1:2] 0.4316 -0.0275 -0.3999 -1.1649 0.0992 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "0" "1" "2" "3" ...
## .. ..$ : chr [1:2] "female" "male"
## $ stdres : 'table' num [1:11, 1:2] 0.6233 -0.0398 -0.5909 -1.7577 0.1521 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "0" "1" "2" "3" ...
## .. ..$ : chr [1:2] "female" "male"
## - attr(*, "class")= chr "htest"
The list data type can store different elements as a list, making these elements individually accessible. For example, different vectors can be stored in a list. The individual parts of a list can be output using the ls()
function and addressed with the $
sign.
ls(chi1)
## [1] "data.name" "expected" "method" "observed" "p.value" "parameter"
## [7] "residuals" "statistic" "stdres"
To see the expected values, we access expected
from the object chi1
.
chi1$expected
##
## female male
## 0 112.42406 113.57594
## 1 133.31702 134.68298
## 2 216.88889 219.11111
## 3 307.42508 310.57492
## 4 375.07849 378.92151
## 5 422.83384 427.16616
## 6 313.89195 317.10805
## 7 259.66972 262.33028
## 8 168.13863 169.86137
## 9 89.04383 89.95617
## 10 41.28848 41.71152
You can see that the condition of having at least \(5\) cases in each cell of the expected values is also met.
So far, we have only conducted the \(\chi^2\) independence test. However, in addition to the general relationship, it is often of interest to know how strong this relationship is or in which direction it goes.
We calculate Cramer’s V for the strength of the relationship between two at least nominally scaled variables. We use the function CramerV()
from the library DescTools
. You specify the two variables and in the third argument, you set the confidence interval (in this case, $0.95`):
install.packages("DescTools")
library("DescTools")
CramerV(
pss$stfdem,
pss$gndr,
conf.level = 0.95 # Konfidenzintervall
)
## Cramer V lwr.ci upr.ci
## 0.05365996 0.00000000 0.06613029
We interpret the test as follows: The first column shows the value for Cramer’s V, and the second and third columns show the confidence interval for this value. If the interval does not include the value \(0\), the result is significant. In this case, we see that the result is not significant.
The following boundaries apply for interpreting the value:
Lower Bound | Upper Bound | Interpretation |
---|---|---|
\(0\) | \(0.1\) | no association |
\(0.1\) | \(0.3\) | weak |
\(0.3\) | \(0.6\) | moderate |
\(0.6\) | \(1\) | strong |
If not only the strength of the association should be interpreted but also the direction of the relationship, a correlation must be calculated.