Correlation plots can also be created using ggplot
. For this, you will again use the library GGally
.
library("GGally")
Now, you will use the function ggcorr()
: Non-metric variables are automatically excluded.
ggcorr(pss)
## Warning in ggcorr(pss): data in column(s) 'district', 'gndr', 'edu', 'income'
## are not numeric and were ignored
In the additional argument method
, you can specify how to handle NA's
and which type of correlation to calculate (pearson
, spearman
, kendall
):
ggcorr(
pss,
method = c(
"pairwise",
"pearson"
)
)
## Warning in ggcorr(pss, method = c("pairwise", "pearson")): data in column(s)
## 'district', 'gndr', 'edu', 'income' are not numeric and were ignored
Additionally, you can display the correlation coefficients using label = TRUE
:
ggcorr(
pss,
method = c(
"pairwise",
"pearson"
),
label = TRUE
)
## Warning in ggcorr(pss, method = c("pairwise", "pearson"), label = TRUE): data
## in column(s) 'district', 'gndr', 'edu', 'income' are not numeric and were
## ignored
You can set the number of decimal places with the argument label_round
:
ggcorr(
pss,
method = c(
"pairwise",
"pearson"
),
label = TRUE,
label_round = 2
)
## Warning in ggcorr(pss, method = c("pairwise", "pearson"), label = TRUE, : data
## in column(s) 'district', 'gndr', 'edu', 'income' are not numeric and were
## ignored
In the geom
argument, you can choose between tile
, circle
, text
, or blank
.
ggcorr(
pss,
method = c(
"pairwise",
"pearson"
),
label = TRUE,
label_round = 2,
geom = "circle"
)
## Warning in ggcorr(pss, method = c("pairwise", "pearson"), label = TRUE, : data
## in column(s) 'district', 'gndr', 'edu', 'income' are not numeric and were
## ignored
Lastly, you can specify three colors in the palette
argument (\(-1\), \(0\), $1) to create the color scale: I’m using beyonce
again here! Important: You are not providing the entire palette here, but rather a color from the respective palette, hence an additional number in [..]
brackets!
ggcorr(
pss,
method = c(
"pairwise",
"pearson"
),
label = TRUE,
label_round = 2,
geom = "circle",
low = beyonce_palette(72)[1],
mid = "white",
high = beyonce_palette(72)[2]
)
## Warning in ggcorr(pss, method = c("pairwise", "pearson"), label = TRUE, : data
## in column(s) 'district', 'gndr', 'edu', 'income' are not numeric and were
## ignored
Alternatively, you can also change the limits
. This can be helpful if your correlation values are not very high and all are very weakly colored. With limits = FALSE
, the endpoints are automatically set (according to the data!).
ggcorr(
pss,
method = c(
"pairwise",
"pearson"
),
label = TRUE,
label_round = 2,
geom = "circle",
low = beyonce_palette(72)[1],
mid = "white",
high = beyonce_palette(72)[2],
limits = FALSE
)
## Warning in ggcorr(pss, method = c("pairwise", "pearson"), label = TRUE, : data
## in column(s) 'district', 'gndr', 'edu', 'income' are not numeric and were
## ignored
Important: In ggcorr()
, non-significant values cannot be hidden, as the author of these functions (rightly) opposes a focus on the significance level.
For more information on the functionality in ggcorr()
, you can find it here.
Now let’s move on to the representation of mean comparisons.