In addition to all the fun stuff, sometimes you may want to highlight specific areas of a plot or add labels to cases (with small n), for example.
For this purpose, there are the functions geom_text()
and annotate()
, which can be used with ggplot
. Here, we will use the scatter plot from the beginning again, but this time limiting the number to 15 for a clear representation.
geom_jitter()
cannot be used because the data labels appear at the data point and not at the jittered data point!
scatter2 <- ggplot(
pss[1:15,],
aes(
stfeco,
stfdem
)
) +
geom_point(col = "blue") +
scale_x_continuous(
breaks = seq(
0,
10,
1
)
) +
scale_y_continuous(
breaks = seq(
0,
10,
1
)
)
scatter2
Using the geom_text()
function, you can add labels to the data points. For example, you can use the row number or the ID variable. We will use the latter, as the row number can change with sorting and may not be unique. Therefore, we will now add a label
(idno
) using the function in aes
.
scatter2 +
geom_text(
aes(label = idno)
)
Within
geom_text()
, you can now make further adjustments. We already know a few of them, two more important ones are nudge_y
and nudge_x
, which shift the starting point of the text on the respective axis.
scatter2 +
geom_text(
aes(label = idno),
size = 2,
nudge_y = -.15
)
If you still want to display all data points and only highlight specific ones, this is easily possible: We only want to display the first ten cases and therefore limit the data in geom_text()
. This is also possible using subset()
with multiple conditions.
ggplot(
pss,
aes(
stfeco,
stfdem
)
) +
geom_point(
alpha = .2,
col = "blue"
) +
scale_x_continuous(
breaks = seq(
0,
10,
1
)
) +
scale_y_continuous(
breaks = seq(
0,
10,
1
)
) +
geom_text(
aes(label = idno),
data = pss[1:10,]
)
## Warning: Removed 96 rows containing missing values or values outside the scale range
## (`geom_point()`).
annotate()
offers much greater possibilities. With this, not only labels, but also specific areas within a plot can be highlighted. Let’s take the jittered plot again and mark a specific area in the plot:
scatter2 +
annotate(
"rect",
xmin = 6.5, # this corresponds to the axis scale!
xmax = 8.5,
ymin = 7.5,
ymax = 10.5,
colour = "darkgreen",
fill = "lightgreen"
)
The disadvantage becomes apparent immediately! Since ggplot
is addressed via layers, the annotate()
layer must precede the geom_jitter()
layer. Or we can add alpha
to change the visibility:
scatter2 +
annotate(
"rect",
xmin = 6.5,
xmax = 8.5,
ymin = 7.5,
ymax = 10.5,
colour = "darkgreen",
fill = "lightgreen",
alpha = .1
)
Now we want to add a label to the plot to make it clear to the reader which area we have marked here:
scatter2 +
annotate(
"rect",
xmin = 6.5,
xmax = 8.5,
ymin = 7.5,
ymax = 10.5,
colour = "darkgreen",
fill = "lightgreen",
alpha = .1
) +
annotate(
"text",
x = 4,
y = 9,
label = "highlighted area", # with \n you get a new line
colour = "darkgreen"
)
As another option, annotate()
offers the possibility to create lines, so that we can point our text to the field:
scatter2 +
annotate(
"rect",
xmin = 6.5,
xmax = 8.5,
ymin = 7.5,
ymax = 10.5,
colour = "darkgreen",
fill = "lightgreen",
alpha = .1
) +
annotate(
"text",
x = 4,
y = 9,
label = "highlighted area", # with \n you get a new line
color = "darkgreen"
) +
annotate(
"segment",
x = 4.6,
xend = 6.5,
y = 9,
yend = 9,
color = "darkgreen",
arrow = arrow()
)
In the final step, you will learn about the representation of missing values!