'Find and return column with max value from a list of vectors
I'm trying to calculate favorite brand per productcategory per customer.
1. Start and what I want to achieve
I start with a dataframe (mydata) with purchase orders, build up like this:
| customer | ordernumber | productcategory | brand | amount |
|---|---|---|---|---|
| ABC | 123456 | H11_plumbing | Nitrofill | 6 |
| ABC | 123457 | H11_plumbing | Antileak | 2 |
| DEF | 123458 | H11_plumbing | Nitrofill | 1 |
| DEF | 123459 | H11_plumbing | Antileak | 5 |
| ABC | 123460 | H12_electric | Shock | 10 |
| ABC | 123461 | H12_electric | Lightning | 5 |
| DEF | 123462 | H12_electric | Shock | 4 |
| DEF | 123463 | H12_electric | Lightning | 8 |
What I want to achieve is list per customer the favorite brand per productcategory.
| customer | H11_plumbing_favorite_brand | H12_electric_favorite_brand |
|---|---|---|
| ABC | Nitrofill | Shock |
| DEF | Antileak | Lightning |
For customer ABC Nitrofill (amount = 6) and Shock (amount = 10) are favorites
2. What I am doing now
What I now did was create a list of vectors for each productcategory and change the shape from long to wide using lapply to use data.table::dcast
df_list <- split(mydata, as.factor(mydata$productcategory)) # create list of vectors
library(data.table)
df_list_2 <- lapply(df_list,function(x) x <- data.table::dcast(setDT(x), customer ~ brand, sum, value.var = c("amount"))) # change shape from long to wide
3. Where I get stuck is finding and returning the column with the favorite brand
This is where I get stuck. I have been able to do this for a data.frame (vector) rather than a list of vectors by using this code:
mydata_t <- mydata[mydata$productcategory=="H12_electric",]
mydata_overview<- data.table::dcast(setDT(mydata_t), customer ~ brand, sum, value.var = c("amount"))
rm(mydata_t)
mydata_overview$favorite_brand <- apply(mydata_overview[,-c(1)],1,function(x) which(x==max(x)))
However, if I try to use this code on the list of vectors (df_list) then it doesn't work.
df_list_3 <- lapply(df_list,function(x) x$favorite_brand<- apply(x[,-c(1)],1,function(y) which(y==max(y))))
rm(df_list_t)
Any suggestions?
Solution 1:[1]
Here is one approach. Create a favorite column that includes the brand where you have the maximum amount (for each customer and productcategory combination). Then, use dcast to put into wide form, using the new favorite category as your value.
library(data.table)
setDT(df)
df[, .(favorite = brand[which.max(amount)]), by = .(customer, productcategory)][
, dcast(.SD,
customer ~ productcategory,
value.var = "favorite")
]
Output
customer H11_plumbing H12_electric
1: ABC Nitrofill Shock
2: DEF Antileak Lightning
Data
df <- structure(list(customer = c("ABC", "ABC", "DEF", "DEF", "ABC",
"ABC", "DEF", "DEF"), ordernumber = 123456:123463, productcategory = c("H11_plumbing",
"H11_plumbing", "H11_plumbing", "H11_plumbing", "H12_electric",
"H12_electric", "H12_electric", "H12_electric"), brand = c("Nitrofill",
"Antileak", "Nitrofill", "Antileak", "Shock", "Lightning", "Shock",
"Lightning"), amount = c(6L, 2L, 1L, 5L, 10L, 5L, 4L, 8L)), class = "data.frame", row.names = c(NA,
-8L))
Solution 2:[2]
Here is a solution using data.table:
> library(data.table)
> setDT(dat)
> dcast(dat[ , .(favourite=brand[which.max(amount)]), by=.(customer, productcategory) ], customer ~ productcategory)
customer H11_plumbing H12_electric
1: ABC Nitrofill Shock
2: DEF Antileak Lightning
To break it down:
We can use data.table to find the brand corresponding to the maximum amount for each combination of customer and category as follows:
> dat[ , .(favourite=brand[which.max(amount)]), by=.(customer, productcategory) ]
customer productcategory favourite
1: ABC H11_plumbing Nitrofill
2: DEF H11_plumbing Antileak
3: ABC H12_electric Shock
4: DEF H12_electric Lightning
Then use dcast to reshape this table with customer as the column and productcategory along the rows.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | George Savva |
