'Log scale on y axis but data have negative values

I am trying to create a boxplot with a log y axis as I have some very small values and then some much higher values which do not work well in a boxplot with a continuous y axis. However, I have negative values which obviously do not work with a log scale. I was wondering if there was a way around this so that I can display my data on a boxplot which is still easy to interpret but has a more appropriate scale on the y axis.

    p <- ggplot(data = Elstow.monthly.fluxes, aes(x = Month1, y = CH4.Flux)) + stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) + geom_boxplot() +
xlab(expression("Month")) + ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(breaks = seq(-5000,40000,5000), limits = c(-5000,40000))+
theme(axis.text.x = element_text(colour = "black")) + theme(axis.text.y = element_text(colour = 
"black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+ theme(axis.title = element_text(size = 14))+ 
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) + 
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = 0, linetype ="dashed", colour = "black") 

Current boxplot without log scale



Solution 1:[1]

While you could indeed use the secondary axis to get the labels you want as Zhiqiang suggests, you could also use a transformation that fits your needs.

Consider the following skewed boxplots:

df <- data.frame(
  x = rep(letters[1:2], each = 500),
  y = rlnorm(1000) - 2
)

ggplot(df, aes(x, y)) +
  geom_boxplot()

enter image description here

Instead, you could use the pseudo-log transformation to visualise your data:

ggplot(df, aes(x, y)) +
  geom_boxplot() +
  scale_y_continuous(trans = scales::pseudo_log_trans())

enter image description here

Alternatively, you could make any transformation you want. I personally like the inverse hyperbolic sine transformation, which is very much like the pseudo-log:

asinh_trans <- scales::trans_new(
  "inverse_hyperbolic_sine",
  transform = function(x) {asinh(x)},
  inverse = function(x) {sinh(x)}
  )

ggplot(df, aes(x, y)) +
  geom_boxplot() +
  scale_y_continuous(trans = asinh_trans)

enter image description here

Solution 2:[2]

I have a silly solution: trick the secondary axis to re-scale y axis. I do not have your data, just made up some numbers for the purpose of demonstration.

First convert y values as logy = log(y + 5000). When generating the graph, transform the values back to the original scale. I borrow the second axis to display the values. I am pretty sure others may have more elegant ways to do this.

I was lazy for not trying to find the right way to remove the primary y axis tick labels, just used breaks = c(0).

df<-data.frame(y = runif(33, min=-5000, max=40000), 
           x = rep(c("Aug", "Sep", "Oct"),33)) 
library(tidyverse)
df$logy = log(df$y+5000)

p <- ggplot(data = df, aes(x = x, y = logy)) + 
  stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) + 
  geom_boxplot() +
  xlab(expression("Month")) + 
  ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
  scale_y_continuous(sec.axis = sec_axis(~(exp(.) -5000), 
                                         breaks = c(-4000, 0, 5000, 10000, 20000, 40000)), 
                     breaks = c(0))+ 
  theme(axis.text.x = element_text(colour = "black")) + 
  theme(axis.text.y = element_text(colour = "black")) +
  theme(panel.background = element_rect("white", "black")) +
  theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
  theme(axis.text = element_text(size = 12))+ 
  theme(axis.title = element_text(size = 14))+ 
  theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) + 
  theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
  geom_hline(yintercept = log(5000), linetype ="dashed", colour = "black") 
p

enter image description here

Solution 3:[3]

coord_trans() is applied after the statistics are calculated (unlike scale). This can be combined with the pseudo_log_trans to cope with negatives.

library(plotly)
set.seed(1234)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=500)))

pseudoLog <- scales::pseudo_log_trans(base = 10)
p <- ggplot(dat, aes(x=cond, y=rating)) + geom_boxplot() + coord_trans(y=pseudoLog)

Example of boxplot with negative values on a pseudo-log scale

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 teunbrand
Solution 2
Solution 3 Esme_