'Data Frame in R - Make operations on columns depending the value of factors in the first columns
I am not sure how to type my question but here's what I am trying to do (highly simplified)
I have a data frame with 4 columns that looks something like this:
The first 2 columns are factors (segment / company). The 2 last columns are variables.
I want to divide each value on the last 2 columns by the market value of that specific segment. As you can see if you looked at the picture the problem I run into is that for some sectors I have 3 companies and the market, for other sectors I have 2 companies and the market and so on so the sizes are never the same...
I have solved this by creating tons of "auxiliar" data frames where each data frame contains only the specific sector but I am sure there is a simpler way to do it either using dplyr or using conditionals
something like
if(df[Segment = "Seg1"]) {
df['Var1'] <- df['Var1'] / df[4, 3] &
df['Var2'] <- df['Var2'] / df[4, 4]
} else if (df[Segment = "Seg2"]) {
df['Var1'] <- df['Var1'] / df[7, 3] &
df['Var2'] <- df['Var2'] / df[7, 4]
} else if ....
but as you can imagine this is also not optimal code-wise and I am using the position of the market that I checked manually instead of using code to ask R to find it
Maybe something with mutate or left_join?
Hope my question is clear Does anyone have any idea?
Solution 1:[1]
library(tidyverse)
segment <- c(rep_len("Seg1", 4), rep_len("Seg2", 4))
company <- c(rep_len(c("a", "b", "c", "market"), 8))
var1 <- c(100, 100, 200, 400, 150, 200, 200, 800)
var2 <- c(200, 222, 333, 4444, 555, 666, 777, 888)
df <- data_frame(segment, company, var1, var2)
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
df |> group_by(segment) |>
mutate(new1 = var1/var1[company == "market"], new2 = var2/var2[company == "market"], )
#> # A tibble: 8 × 6
#> # Groups: segment [2]
#> segment company var1 var2 new1 new2
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Seg1 a 100 200 0.25 0.0450
#> 2 Seg1 b 100 222 0.25 0.0500
#> 3 Seg1 c 200 333 0.5 0.0749
#> 4 Seg1 market 400 4444 1 1
#> 5 Seg2 a 150 555 0.188 0.625
#> 6 Seg2 b 200 666 0.25 0.75
#> 7 Seg2 c 200 777 0.25 0.875
#> 8 Seg2 market 800 888 1 1
Created on 2022-01-25 by the reprex package (v2.0.1)
Solution 2:[2]
So I solved it like this
- created new "aux" df with only market values
market.df <- df %>% filter(Company == 'Market")
- then matched with
left_join()(note that mymarket.dfis much smaller than my original df so I can't just divide my original df bymarket.df)
new.df <- left_join(df, unique(market.aux),
by == 'Segment',
suffix = c("", ".market"))
- then just separated
new.dfinto 2 data frames and divided them
aux.1 <- select(new.df, 'Variable 1', 'Variable 2')
aux.2 <- select(new.df, 'Variable 1.market', 'Variable 2.market')
results <- aux.1/aux.2
- then I just took back the first 2 columns of the original data frame to add Segment and Company again...
The complicated part of this was that not all segments had the same length so left_join and unique were very important for my solution to work
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Grzegorz Sapijaszko |
| Solution 2 | benson23 |
