'R: Add new column by specific patterns in another column of the dataframe

my dataframe A looks like this:

**Group**    **Pattern**
One         Black & White 
Two         Black OR Pink
Three           Red
Four            Pink
Five        White & Green
Six         Green & Orange
Seven           Orange
Eight        Pink & Red
Nine        Black OR White
Ten             Green
.                 .
.                 .
.                 .

I have then dataframe B which looks like this:

**Color**    **Value**
Orange         12
Pink            2
Red             4
Green          22
Black          84
White         100

I would like to add a new column, called Value, in dataframe A, based on its Pattern column. I want it to be in a way that if there is any (&), the values are summed up (for example if it is Black & White, I want it to become 184) and if there is any (OR), I want to have the higher number (in the same example, it will be 100).

I can join them with dplyr inner_join, but rows with &/OR are excluded.Is there any other way?

Cheers!

r


Solution 1:[1]

This is a fairly pedestrian method, but effective:

A$Value <- A$Pattern
for(i in seq(nrow(B))) A$Value <- gsub(B$Color[i], B$Value[i], A$Value)
A$Value <- sub("&", "+", A$Value)
A$Value <- sub("^(\\d+) OR (\\d+)$", "max(\\1, \\2)", A$Value)
A$Value <- vapply(A$Value, function(x) eval(parse(text = x)), numeric(1))
A
#>    Group        Pattern Value
#> 1    One  Black & White   184
#> 2    Two  Black OR Pink    84
#> 3  Three            Red     4
#> 4   Four           Pink     2
#> 5   Five  White & Green   122
#> 6    Six Green & Orange    34
#> 7  Seven         Orange    12
#> 8  Eight     Pink & Red     6
#> 9   Nine Black OR White   100
#> 10   Ten          Green    22

Created on 2022-02-18 by the reprex package (v2.0.1)


DATA

A <- structure(list(Group = c("One", "Two", "Three", "Four", "Five", 
"Six", "Seven", "Eight", "Nine", "Ten"), Pattern = c("Black & White", 
"Black OR Pink", "Red", "Pink", "White & Green", "Green & Orange", 
"Orange", "Pink & Red", "Black OR White", "Green")), class = "data.frame", 
row.names = c(NA, -10L))

B <- structure(list(Color = c("Orange", "Pink", "Red", "Green", "Black", 
"White"), Value = c(12L, 2L, 4L, 22L, 84L, 100L)), class = "data.frame", 
row.names = c(NA, -6L))

Solution 2:[2]

I would try something like this, I like R base better Assuming df2 as the second dataframe

df['Value'] = apply(df['Pattern'], 1, function(Pattern){
  s = strsplit(Pattern, ' & ')[[1]]
  if (length(s) == 2) {
     return(with(df2, Value[Color == s[1]] + Value[Color == s[2]]))
  }
  s = strsplit(Pattern, ' OR ')[[1]]
  if (length(s) == 2) {
     return(with(df2, max(Value[Color == s[1]], Value[Color == s[2]])))
  }
  return(df2[df2$Color == Pattern,]$Value)
})

df
#>    Group        Pattern Value
#> 1    One  Black & White   184
#> 2    Two  Black OR Pink    84
#> 3  Three            Red     4
#> 4   Four           Pink     2
#> 5   Five  White & Green   122
#> 6    Six Green & Orange    34
#> 7  Seven         Orange    12
#> 8  Eight     Pink & Red     6
#> 9   Nine Black OR White   100
#> 10   Ten          Green    22

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Allan Cameron
Solution 2