'How to connect across multiple consecutive missing data values using geom_line?

I have a similar problem to Q: Connecting across missing values with geom_line, but found the answers provided only connect the lines when there is one missing value only. If there are 2+ consecutive missing values the solutions offered do not apply.

I need to connect multiple observations made over time for individual trees. Sometimes measurements were missed such that there are missing values in my df, and sometimes an individual tree was missed more than one year in a row, such that there are multiple consecutive NAs.

When there is only one consecutive NA, using geom_line with this specification works a treat to connect across missing values:

geom_line(data = df[!is.na(df$y),])

When there is more than one consecutive NA (i.e. 2 measurements missed) geom_line will not draw across the missing data. Applying !is.na to the whole df does not solve the problem, nor does using geom_path.

Here is code to generate a df that replicates the issue:

x <- c(1,2,3,4,5,6,7,8,9)
tr1 <- c(20,25,18,16,22,12,NA,15,45)
tr2 <- c(12,NA,NA,NA,30,48,30,NA,NA)
df <- data.frame(x, tr1,tr2)

The following code can be used to graph a) tree1 with NA missing, b) tree1 with NA bridged, b) tree2 with geom_line correction in code but missing the expected line across NAs

tree1 <- ggplot(df, aes(x, tr1)) + geom_point() +
  geom_line()
tree1.fix <- ggplot(df, aes(x, tr1)) + geom_point() + 
  geom_line(data = df[!is.na(df$tr1),])
nofix <- ggplot(df, aes(x, tr2)) + geom_point() +
  geom_line(data = df[!is.na(df$tr2),])
grid.arrange(tree1, tree1.fix, nofix, ncol = 3)

Any ideas?



Solution 1:[1]

geom_line() does not connect across any missing data (NA). And geom_point() does not plot missing data either. That is the correct default behaviour for missing data. NA cannot be placed on numerical axes.

What you are doing with df[!is.na(df$tr2),] is removing the missing data before sending it to geom_line(), tricking into thinking that your data is complete. To better understand this, print out df[!is.na(df$tr2), c("x", "tr2")]. That's the data that geom_line() receives. All of this data is displayed and connected. There are no NAs in that data, because you removed them.

In your "nofix example, you get a line from x=1 to x=5, over three consecutive NA. So I assume that you mean that geom_line() does not continue after x=7? But look at the data. There is no data after x=7. Every x>7 has y=NA. And if you remove NAs, then there is no data at all after x=7.

If your example had one more point, say x=10 y=10, then the line would continue from x=7 to x=10.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1