Measures of Association

With measures of association, you can test whether and to what extent two variables are related. In your statistics lecture, you learned about \(\chi^2\) as a first measure to test for an association. Another measure you are familiar with for the strength of an association is Cramer’s V.

\(\chi^2\) Independence Test

Cross-tabulations can display relationships between two variables. This relationship can also be statistically tested using the \(\chi^2\) independence test.

The null hypothesis of the \(\chi^2\) independence test is:

  • \(H_0:\) Variables are statistically independent.

We can perform the test using the chisq.test() function and directly use the cross-tabulation object:

chi1 <- chisq.test(mytable)

chi1
## 
## 	Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 14.123, df = 10, p-value = 0.1674

Alternatively, you can also specify the two variables:

chi2 <- chisq.test(
  pss$stfdem, 
  pss$gndr
)

chi2
How do we interpret this test?

Assumptions of the \(\chi^2\) Independence Test

The calculation of the \(\chi^2\) independence test requires two assumptions:

  1. At least \(10\) observations in each cell.

\(\Rightarrow\) Cross table! (\(\checkmark\))

  1. At least \(5\) expected observations in each cell.

The function chisq.test() stores various values. Therefore, an associated object has the data type list. This data type is new to you. It is another data type that plays a significant role in R. In the list data type, multiple pieces of information can be stored, similar to a list on paper. You can use str() to display what has been stored in the list:

str(chi1)
## List of 9
##  $ statistic: Named num 14.1
##   ..- attr(*, "names")= chr "X-squared"
##  $ parameter: Named int 10
##   ..- attr(*, "names")= chr "df"
##  $ p.value  : num 0.167
##  $ method   : chr "Pearson's Chi-squared test"
##  $ data.name: chr "mytable"
##  $ observed : 'table' int [1:11, 1:2] 117 133 211 287 377 466 311 251 164 82 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:11] "0" "1" "2" "3" ...
##   .. ..$ : chr [1:2] "female" "male"
##  $ expected : num [1:11, 1:2] 112 133 217 307 375 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:11] "0" "1" "2" "3" ...
##   .. ..$ : chr [1:2] "female" "male"
##  $ residuals: 'table' num [1:11, 1:2] 0.4316 -0.0275 -0.3999 -1.1649 0.0992 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:11] "0" "1" "2" "3" ...
##   .. ..$ : chr [1:2] "female" "male"
##  $ stdres   : 'table' num [1:11, 1:2] 0.6233 -0.0398 -0.5909 -1.7577 0.1521 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:11] "0" "1" "2" "3" ...
##   .. ..$ : chr [1:2] "female" "male"
##  - attr(*, "class")= chr "htest"

The list data type can store different elements as a list, making these elements individually accessible. For example, different vectors can be stored in a list. The individual parts of a list can be output using the ls() function and addressed with the $ sign.

ls(chi1)
## [1] "data.name" "expected"  "method"    "observed"  "p.value"   "parameter"
## [7] "residuals" "statistic" "stdres"

To see the expected values, we access expected from the object chi1.

chi1$expected
##     
##         female      male
##   0  112.42406 113.57594
##   1  133.31702 134.68298
##   2  216.88889 219.11111
##   3  307.42508 310.57492
##   4  375.07849 378.92151
##   5  422.83384 427.16616
##   6  313.89195 317.10805
##   7  259.66972 262.33028
##   8  168.13863 169.86137
##   9   89.04383  89.95617
##   10  41.28848  41.71152

You can see that the condition of having at least \(5\) cases in each cell of the expected values is also met.

Cramer’s V

So far, we have only conducted the \(\chi^2\) independence test. However, in addition to the general relationship, it is often of interest to know how strong this relationship is or in which direction it goes.

We calculate Cramer’s V for the strength of the relationship between two at least nominally scaled variables. We use the function CramerV() from the library DescTools. You specify the two variables and in the third argument, you set the confidence interval (in this case, $0.95`):

install.packages("DescTools")
library("DescTools")
CramerV(
  pss$stfdem,       
  pss$gndr,          
  conf.level = 0.95 # Konfidenzintervall
)  
##   Cramer V     lwr.ci     upr.ci 
## 0.05365996 0.00000000 0.06613029

We interpret the test as follows: The first column shows the value for Cramer’s V, and the second and third columns show the confidence interval for this value. If the interval does not include the value \(0\), the result is significant. In this case, we see that the result is not significant.

The following boundaries apply for interpreting the value:

Lower Bound Upper Bound Interpretation
\(0\) \(0.1\) no association
\(0.1\) \(0.3\) weak
\(0.3\) \(0.6\) moderate
\(0.6\) \(1\) strong

If not only the strength of the association should be interpreted but also the direction of the relationship, a correlation must be calculated.