'R: Add new column by specific patterns in another column of the dataframe
my dataframe A looks like this:
**Group** **Pattern**
One Black & White
Two Black OR Pink
Three Red
Four Pink
Five White & Green
Six Green & Orange
Seven Orange
Eight Pink & Red
Nine Black OR White
Ten Green
. .
. .
. .
I have then dataframe B which looks like this:
**Color** **Value**
Orange 12
Pink 2
Red 4
Green 22
Black 84
White 100
I would like to add a new column, called Value, in dataframe A, based on its Pattern column. I want it to be in a way that if there is any (&), the values are summed up (for example if it is Black & White, I want it to become 184) and if there is any (OR), I want to have the higher number (in the same example, it will be 100).
I can join them with dplyr inner_join, but rows with &/OR are excluded.Is there any other way?
Cheers!
Solution 1:[1]
This is a fairly pedestrian method, but effective:
A$Value <- A$Pattern
for(i in seq(nrow(B))) A$Value <- gsub(B$Color[i], B$Value[i], A$Value)
A$Value <- sub("&", "+", A$Value)
A$Value <- sub("^(\\d+) OR (\\d+)$", "max(\\1, \\2)", A$Value)
A$Value <- vapply(A$Value, function(x) eval(parse(text = x)), numeric(1))
A
#> Group Pattern Value
#> 1 One Black & White 184
#> 2 Two Black OR Pink 84
#> 3 Three Red 4
#> 4 Four Pink 2
#> 5 Five White & Green 122
#> 6 Six Green & Orange 34
#> 7 Seven Orange 12
#> 8 Eight Pink & Red 6
#> 9 Nine Black OR White 100
#> 10 Ten Green 22
Created on 2022-02-18 by the reprex package (v2.0.1)
DATA
A <- structure(list(Group = c("One", "Two", "Three", "Four", "Five",
"Six", "Seven", "Eight", "Nine", "Ten"), Pattern = c("Black & White",
"Black OR Pink", "Red", "Pink", "White & Green", "Green & Orange",
"Orange", "Pink & Red", "Black OR White", "Green")), class = "data.frame",
row.names = c(NA, -10L))
B <- structure(list(Color = c("Orange", "Pink", "Red", "Green", "Black",
"White"), Value = c(12L, 2L, 4L, 22L, 84L, 100L)), class = "data.frame",
row.names = c(NA, -6L))
Solution 2:[2]
I would try something like this, I like R base better Assuming df2 as the second dataframe
df['Value'] = apply(df['Pattern'], 1, function(Pattern){
s = strsplit(Pattern, ' & ')[[1]]
if (length(s) == 2) {
return(with(df2, Value[Color == s[1]] + Value[Color == s[2]]))
}
s = strsplit(Pattern, ' OR ')[[1]]
if (length(s) == 2) {
return(with(df2, max(Value[Color == s[1]], Value[Color == s[2]])))
}
return(df2[df2$Color == Pattern,]$Value)
})
df
#> Group Pattern Value
#> 1 One Black & White 184
#> 2 Two Black OR Pink 84
#> 3 Three Red 4
#> 4 Four Pink 2
#> 5 Five White & Green 122
#> 6 Six Green & Orange 34
#> 7 Seven Orange 12
#> 8 Eight Pink & Red 6
#> 9 Nine Black OR White 100
#> 10 Ten Green 22
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Allan Cameron |
| Solution 2 |
