'How to combine columns that have the same name and remove NA's?

Relatively new to R, but I have an issue combining columns that have the same name. I have a very large dataframe (~70 cols and 30k rows). Some of the columns have the same name. I wish to merge these columns and remove the NA's.

An example of what I would like is below (although on a much larger scale).

df <- data.frame(x = c(2,1,3,5,NA,12,"blah"),
                 x = c(NA,NA,NA,NA,9,NA,NA),
                 y = c(NA,5,12,"hop",NA,2,NA),
                 y = c(2,NA,NA,NA,8,NA,4),
                 z = c(9,5,NA,3,2,6,NA))
desired.result <- data.frame(x = c(2,1,3,5,9,12,"blah"), 
                             y = c(2,5,12,"hop",8,2,4),
                             z = c(9,5,NA,3,2,6,NA))

I have tried a number of things including suggestions such as: R: merging columns and the values if they have the same column name Combine column to remove NA's

However, these solutions either require a numeric dataset (I need to keep the character information) or they require you to manually input the columns that are the same (which is too time consuming for the size of my dataset).

I have managed to solve the issue manually by creating new columns that are combinations:

df$x <- apply(df[,1:2], 1, function(x) x[!is.na(x)][1])

However I don't know how to get R to auto-identify where the columns have the same names and then apply something like the above such that I don't need to specify the index each time.

Thanks



Solution 1:[1]

here is a base R approach

#split into a named list, nased on colnames befote the .-character
L <- split.default(df, f = gsub("(.*)\\..*", "\\1", names(df)))
#get the first non-na value for each row in each chunk
L2 <- lapply(L, function(x) apply(x, 1, function(y) na.omit(y)[1]))
# result in a data.frame
as.data.frame(L2)

#      x   y  z
# 1    2   2  9
# 2    1   5  5
# 3    3  12 NA
# 4    5 hop  3
# 5    9   8  2
# 6   12   2  6
# 7 blah   4 NA

# since you are using mixed formats, the columsn are not of the same class!!
str(as.data.frame(L2))
# 'data.frame': 7 obs. of  3 variables:
# $ x: chr  "2" "1" "3" "5" ...
# $ y: chr  " 2" "5" "12" "hop" ...
# $ z: num  9 5 NA 3 2 6 NA

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wimpel