'R data table how to unite rows with error check
I have next data table dataframe
library(dplyr)
library(data.table)
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
my_data <- my_data %>% as.data.table() ## convert to data table
> my_data
id sample_number res_1 res_2 res_3
1 1 d1 AA <NA> II
2 1 rr1 <NA> TT II
3 2 d2 <NA> CC DD
4 2 rr2 GG <NA> ID
5 3 d3 AG TC ID
Uniq column is id. For some id exists 2 rows with different values in sample_number column. How can I unite rows by id column?
For id 2 in column res_3 mistake exists. In that case result of unite will by '---'. Result is next
id sample_number res_1 res_2 res_3
1 d1, rr1 AA TT II
2 d2, rr2 GG CC '---'
3 d3 AG TC ID
Solution 1:[1]
a data.table approach
my_data[, sample_number := paste0(sample_number, collapse = ", "), by = .(id)]
DT <- melt(my_data, id.vars = c("id", "sample_number"), na.rm = TRUE)
dcast(DT, id + sample_number ~ variable, value.var = "value",
fun.aggregate = function(x) ifelse(length(unique(x)) > 1, "---", x))
# id sample_number res_1 res_2 res_3
# 1: 1 d1, rr1 AA TT II
# 2: 2 d2, rr2 GG CC ---
# 3: 3 d3 AG TC ID
Solution 2:[2]
Here is an option
# Define custom function to collapse entries from columns `res_*`
collapse <- function(x) {
if (length(unique(x[!is.na(x)])) == 1) unique(x[!is.na(x)]) else "----"
}
library(tidyverse)
my_data %>%
group_by(id) %>%
summarise(
sample_number = toString(sample_number),
across(starts_with("res"), collapse),
.groups = "drop")
## A tibble: 3 x 5
# id sample_number res_1 res_2 res_3
# <dbl> <chr> <chr> <chr> <chr>
#1 1 d1, rr1 AA TT II
#2 2 d2, rr2 GG CC ----
#3 3 d3 AG TC ID
Note that I assume that the NAs in your data.frame are real NAs, as in
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wimpel |
| Solution 2 | Maurits Evers |
