'R ggplot- line labels to right of plot without overlapping
To preface, I'm relatively new to R and trying to replicate a previous user's inquiry found here: Stackoverflow question.
Instead of the data being in days, and looking only at more recent set of dates, I want to include all points of time, in quarters, for the data. I keep encountering issues as I'm not well equipped to understand each point of their code in order to go about this in a reasonable way.
Edit: I've tried to follow the example, however, I still can't replicate the example provided. I'm trying to get the labels to the right, and change the x-axis to display Q1 2015, Q2 2015, etc. I've placed my attempt at the code below:
library(readxl)
library(ggrepel)
library(tidyverse)
library(ggplot2)
owid <- read_xlsx("/Desktop/testR.xlsx") %>%
filter(date >= "2014-01-01" & date <= "2020-07-01") %>%
select(location, date, outcome) %>%
arrange(location, date) %>%
group_by(location) %>%
complete(date = seq.Date(as.Date("2014-01-01"),
as.Date("2020-07-01"),
by="quarter")) %>%
fill(outcome) %>%
ungroup() %>%
mutate(location = factor(location),
location = fct_reorder2(location, outcome,
outcome)) %>%
mutate(datenew= as.Date(date, format= "%d.%m.%Y")) %>%
mutate(label = if_else(datenew == max(datenew),
as.character(location),
NA_character_)) %>%
mutate(yq = as.yearqtr(datenew))
G01 <-
owid %>%
ggplot(aes(x=datenew, y=outcome, group=location,
color=location)) +
geom_point() +
geom_line() +
theme_minimal() +
labs(y="",
x="") +
theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(linetype = "dashed"),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_blank(),
plot.title.position = "plot",
plot.title = element_text(face="bold"),
legend.position = "none") +
scale_y_continuous(breaks=c(seq(0, 70, 10))) +
scale_x_date(breaks = as.Date(c("2015-01-01",
"2015-04-01",
"2015-07-01",
"2015-10-01",
"2016-01-01",
"2016-04-01",
"2016-07-01",
"2016-10-01",
"2017-01-01",
"2017-04-01",
"2017-07-01",
"2017-10-01",
"2018-01-01",
"2018-04-01",
"2018-07-01",
"2018-10-01",
"2019-01-01",
"2019-04-01",
"2019-07-01",
"2019-10-01",
"2020-01-01",
"2020-04-01",
"2020-07-01")),
labels = scales::date_format("%Y-%m"),
limits = as.Date(c("2015-01-01",
"2020-07-01")))
G01 +
geom_text_repel(aes(label = gsub("^.*$", " ", label)), # This will force the correct position of the link's right end.
segment.curvature = -0.1,
segment.square = TRUE,
segment.color = 'grey',
box.padding = 0.1,
point.padding = 0.6,
nudge_x = 0.15,
nudge_y = 1,
force = 0.5,
hjust = 0,
direction="y",
na.rm = TRUE,
xlim = as.Date(c("2015-01-01", "2020-07-01")),
ylim = c(0,70),
) +
geom_text_repel(data = . %>% filter(!is.na(label)),
aes(label = paste0(" ", label)),
segment.alpha = 0, ## This will 'hide' the link
segment.curvature = -0.1,
segment.square = TRUE,
# segment.color = 'grey',
box.padding = 0.1,
point.padding = 0.6,
nudge_x = 0.15,
nudge_y = 1,
force = 0.5,
hjust = 0,
direction="y",
na.rm = TRUE,
xlim = as.Date(c("2015-01-01", "2020-07-01")),
ylim = c(0,70))
My results look like such here
Solution 1:[1]
I'm new to SO, but I'll give it a go. If you add the year quarter information as a column and plot that as your x variable (without calculating per quarter means), you will end up with many points per quarter and a plot that is hard to read.
Try running this to see what I mean:
library(tidyverse)
library(ggrepel)
library(zoo)
keep <- c("Israel", "United Arab Emirates", "United Kingdom",
"United States", "Chile", "European Union", "China",
"Russia", "Brazil", "World", "Mexico", "Indonesia",
"Bangladesh")
owid <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv") %>%
filter(location %in% keep) %>%
filter(date >= "2021-01-01" & date <= "2022-02-12") %>% ## edit
select(location, date, total_vaccinations_per_hundred) %>%
arrange(location, date) %>%
group_by(location) %>%
complete(date = seq.Date(as.Date("2021-01-01"),
as.Date("2022-02-12"), ## edit
by="day")) %>%
fill(total_vaccinations_per_hundred) %>%
ungroup() %>%
mutate(location = factor(location),
location = fct_reorder2(location, total_vaccinations_per_hundred,
total_vaccinations_per_hundred)) %>%
mutate(label = if_else(date == max(date),
as.character(location),
NA_character_)) %>%
mutate(yq = as.yearqtr(date)) ## add year quarter column
owid %>%
ggplot(aes(x=yq, y=total_vaccinations_per_hundred, group=location,
color=location)) +
geom_point() +
geom_line() +
theme_minimal() +
labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people",
subtitle = "This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).",
caption = "Source: Official data collected by Our World in Data — Last updated 13 February, 11:40 (London time)",
y="",
x="") +
theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(linetype = "dashed"),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_blank(),
plot.title.position = "plot",
plot.title = element_text(face="bold"),
legend.position = "none") +
geom_label_repel(aes(label = label),
nudge_x = 1,
hjust = "left", direction="y",
na.rm = TRUE) +
scale_x_yearqtr(limits = c(min(owid$yq), max(owid$yq)),
format = "%YQ%q") ## scale axis by year quarter
Instead, what I think what you may want to do is leave the data frame as is and manually set the breaks in the plot. This is similar to what they did in the example but you will be setting the date breaks as quarters (e.g., first of Jan, April, etc). Doing things manually is painful but, in my experience, dealing with dates in R is never completely painless.
Using your example again (just add this to the bottom of the previous code to run):
owid %>%
ggplot(aes(x=date, y=total_vaccinations_per_hundred, group=location,
color=location)) +
geom_point() +
geom_line() +
theme_minimal() +
labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people",
subtitle = "This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).",
caption = "Source: Official data collected by Our World in Data — Last updated 13 February, 11:40 (London time)",
y="",
x="") +
theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(linetype = "dashed"),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_blank(),
plot.title.position = "plot",
plot.title = element_text(face="bold"),
legend.position = "none") +
geom_label_repel(aes(label = label),
nudge_x = 1,
hjust = "left", direction="y",
na.rm = TRUE) +
scale_x_date(breaks = as.Date(c("2021-01-01", ## set manually
"2021-04-01",
"2021-07-01",
"2021-10-01",
"2022-01-01",
"2022-04-01")),
labels = scales::date_format("%b %d"),
limits = as.Date(c("2021-01-01",
"2022-04-01")))
If you do actually want one data point per quarter, then you could add the year-quarter column to your data frame (done in the first block of code here) and summarise the data before plotting, similar to this.
One last time:
owid %>%
group_by(location, yq) %>% ## group by location and year quarter
summarise(., total_vaccinations_per_hundred = mean(total_vaccinations_per_hundred)) %>% ## summarise vaccinations
ggplot(aes(x=yq, y=total_vaccinations_per_hundred, group=location,
color=location)) +
geom_point() +
geom_line() +
theme_minimal() +
labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people",
subtitle = "This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).",
caption = "Source: Official data collected by Our World in Data — Last updated 13 February, 11:40 (London time)",
y="",
x="") +
theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(linetype = "dashed"),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_blank(),
plot.title.position = "plot",
plot.title = element_text(face="bold"),
legend.position = "none")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
