'Concatenate 2+ columns into a new column, skip blanks in R

My current dataset looks something like this:

Col1     Col2     Col3 
1        b        c           
2                 c      
3        B       

I want to add a new column that returns everything in the row separated by commas, skipping blank cells. Example:

Col1     Col2     Col3     newCol
1        b        c        1, b, c   
2                 c        2, c
3        B                 3, B                          

I've tried df$newCol <- apply(df[,1:3], 1, paste, collapse=", "), but it returns the new column with extra commas:

newCol
1, b, c
2, , c
3, B, 
r


Solution 1:[1]

Here we use unite function after replacing blank cells with NA. At the end we replace back to blank cells:

  • Main feature is to use na.rm argument of unite
library(dplyr)
library(tidyr)

df %>% 
  mutate(across(everything(), na_if,"")) %>% 
  unite(newCol, Col1:Col3, na.rm = TRUE, sep = ", ", remove = FALSE) %>% 
  mutate(across(everything(), ~replace_na(., ""))) %>% 
  relocate(newCol, .after = Col3)
  Col1 Col2 Col3  newCol
1    1    b    c 1, b, c
2    2         c    2, c
3    3    B         3, B

Solution 2:[2]

Using do.call with paste

df$newCol <- gsub("\\s+,\\s+", " ", trimws(do.call(paste, 
     c(df, sep=", ")), whitespace =",\\s*"))

Solution 3:[3]

A possible solution, when your empty cells are "":

df <- data.frame(
  stringsAsFactors = FALSE,
  Col1 = c(1L, 2L, 3L),
  Col2 = c("b", "", "B"),
  Col3 = c("c", "c", "")
)

df$newCol <- apply(df, 1, function(x) paste(x[x != ""], collapse=", "))
df

#>   Col1 Col2 Col3  newCol
#> 1    1    b    c 1, b, c
#> 2    2         c    2, c
#> 3    3    B         3, B

A possible solution, which uses na.omit when your empty cells are NA:

df <- data.frame(
  stringsAsFactors = FALSE,
  Col1 = c(1L, 2L, 3L),
  Col2 = c("b", NA, "B"),
  Col3 = c("c", "c", NA)
)

df$Col4 <- apply(df, 1, function(x) paste(na.omit(x), collapse = ", "))
df

#>   Col1 Col2 Col3    Col4
#> 1    1    b    c 1, b, c
#> 2    2 <NA>    c    2, c
#> 3    3    B <NA>    3, B

Solution 4:[4]

Here is a data.table, do.call, and paste option:

library(data.table)

setDT(df)[, newCol := gsub("\\s+,\\s+", " ", 
                           gsub(", $", "", do.call(paste, c(.SD, sep = ", "))))]

Output

   Col1 Col2 Col3  newCol
1:    1    b    c 1, b, c
2:    2         c    2, c
3:    3    B         3, B

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TarJae
Solution 2 akrun
Solution 3
Solution 4 AndrewGB