We use Scatterplots to represent two (pseudo-)metric variables. To do this, we use the geom_point()
function.
Often, we only have pseudo-metric variables, but we can still use Scatterplots for visualization. We now use trstplt
and trstprt
. If you don’t remember what these variables stand for, check the codebook!
scatter <- ggplot(
pss,
aes(
trstplt,
trstprt
)
) +
geom_point()
scatter
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
To better identify data points on the plot, we need to scatter the data points so they do not overlap. Since pseudo-metric variables usually have only integer values, data pairs may overlap. We use the geom_jitter()
function for this:
scatter <- scatter +
geom_jitter(
width = 0.3,
height = 0.3
)
scatter
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
In the arguments of geom_jitter()
, specify how far you want the data points to jitter. Just try a few times with different values.
Now we add labels and titles.
scatter +
geom_point() +
geom_jitter(
width = 0.3,
height = 0.3
) +
labs(
x = "Trust in Politicians",
y = "Trust in Legal System",
title = "Trust Scatterplot"
)
And we change the appearance of the title: Within the theme()
function, we modify the display. You will learn more about what the arguments do in Chapter 3!
scatter <- scatter +
labs(
x = "Trust in Politicians",
y = "Trust in Legal System",
title = "Trust Scatterplot"
) +
theme(
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 16
)
)
scatter
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
We also specify the data source. We do this using the lab()
function and the caption
argument:
scatter <- scatter +
labs(caption = "Data source: Panem Social Survey.")
scatter
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
The variable has only integer values, but the markings are always at the midpoint. Let’s change that now:
scatter <- scatter +
scale_y_continuous(
breaks = seq(
0,
10,
1
)
) +
scale_x_continuous(
breaks = seq(
0,
10,
1
)
)
scatter
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
We can also already plot a regression line for the relationship between the two variables. To do this, we use the geom_smooth()
function. In the method
argument, we specify that it is a linear model (lm
), the confidence interval should be plotted (se = TRUE
), and we set colors.
scatter +
geom_smooth(
method = "lm",
se = TRUE,
color = "darkred",
fill = "orange"
)
## Warning: Removed 28 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
Let’s continue, and now you add a grouping variable!