'Add means to histograms by group in ggplot2

I am following this source to do histograms by group in ggplot2.

The sample data looks like this:

 set.seed(3)
x1 <- rnorm(500)
x2 <- rnorm(500, mean = 3)
x <- c(x1, x2)
group <- c(rep("G1", 500), rep("G2", 500))

df <- data.frame(x, group = group)

And the code:

# install.packages("ggplot2")
library(ggplot2)

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity")

I know that adding a line like:

  +geom_vline(aes(xintercept=mean(group),color=group,fill=group), col = "red")

Should allow me to get what I am looking for, but I am obtaining just an histogram with one mean, not a mean by group: Histogram

Do you have any suggestions?



Solution 1:[1]

In addition to the previous suggestion, you can also use separately stored group means, i. e. two instead of nrow=1000 highly redundant values:

## a 'tidy' (of several valid ways for groupwise calculation):
group_means <- df %>%
  group_by(group) %>%
  summarise(group_means = mean(x, na.rm = TRUE)) %>%
  pull(group_means)

## ... ggplot code ... +
    geom_vline(xintercept = group_means)

Solution 2:[2]

I would compute the mean into the dataframe:

library(ggplot2)
library(dplyr)

df %>% 
  group_by(group) %>% 
  mutate(mean_x = mean(x)) 

output is:

# A tibble: 1,000 × 3
# Groups:   group [2]
         x group mean_x
     <dbl> <chr>  <dbl>
 1 -0.962  G1    0.0525
 2 -0.293  G1    0.0525
 3  0.259  G1    0.0525
 4 -1.15   G1    0.0525
 5  0.196  G1    0.0525
 6  0.0301 G1    0.0525
 7  0.0854 G1    0.0525
 8  1.12   G1    0.0525
 9 -1.22   G1    0.0525
10  1.27   G1    0.0525
# … with 990 more rows

So do:

library(ggplot2)
library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(mean_x = mean(x)) %>% 
  ggplot(aes(x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity") +
  geom_vline(aes(xintercept = mean_x), col = "red")

Output is:

enter image description here

Solution 3:[3]

A straightforward method without precomputation would be:

ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity") +
  geom_vline(xintercept = tapply(df$x, df$group, mean), col = "red")

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 I_O
Solution 2 Stephan
Solution 3 Allan Cameron