You are familiar with boxplots from your statistics lecture. Boxplots show the quartiles, as well as the minimum and maximum values (or outliers). Now you will learn more about the individual functions on how to create a boxplot. We introduce two functions: geom_boxplot()
or ggboxplot()
. We use boxplots to represent a metric variable. Boxplots of a metric variable can also be divided by a grouping variable.
Additionally, on the last page, we introduce rainclouds. These are similar to a boxplot but provide additional information on how the values are distributed.
geom_boxplot()
We can easily create boxplots of age. For this, we use the function geom_boxplot()
:
boxplot <- ggplot(
pss,
aes(agea)
) +
geom_boxplot()
boxplot
## Warning: Removed 157 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Now, the x-axis represents the metric variable age. The interpretation of the y-axis may be confusing as it is not interpreted in content. However,
ggplot
requires it for visualization. Therefore, you can simply adjust it and rotate the boxplot:
boxplot <- ggplot(
pss,
aes(agea)
) +
geom_boxplot() +
coord_flip() +
scale_y_continuous(breaks = NULL)
boxplot
## Warning: Removed 157 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
In the following, we will display the boxplot of age depending on the district. Often, you want to display the distribution of a metric variable based on a categorical variable. To do this, simply add the group variable within the ggplot()
function in the aes
argument. Remember: We have swapped the axes, so the group variable is on the x-axis (1st argument in aes
), even though it appears on the y-axis in the plot!
boxplotDistrict <- ggplot(
pss,
aes(
district,
agea,
fill = district
)
) +
geom_boxplot()
boxplotDistrict
## Warning: Removed 157 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
As before, we can easily adjust the plot:
boxplotDistrict +
scale_fill_manual(
name = "Distrikt",
values = cbp1
) +
scale_x_discrete(
limits = c(
"Distrikt 10",
"Distrikt 7",
"Distrikt 12",
"Distrikt 5",
"Distrikt 1"
)
) +
scale_y_continuous(
breaks = seq(
0,
100,
5
)
) +
labs(
x = "District",
y = "Age in years",
title = "Boxplots of Age by District"
) +
coord_flip()
## Warning: Removed 157 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
On the next page, you will learn about an alternative!