'Data Frame in R - Make operations on columns depending the value of factors in the first columns

I am not sure how to type my question but here's what I am trying to do (highly simplified) I have a data frame with 4 columns that looks something like this:
table

The first 2 columns are factors (segment / company). The 2 last columns are variables.

I want to divide each value on the last 2 columns by the market value of that specific segment. As you can see if you looked at the picture the problem I run into is that for some sectors I have 3 companies and the market, for other sectors I have 2 companies and the market and so on so the sizes are never the same...

I have solved this by creating tons of "auxiliar" data frames where each data frame contains only the specific sector but I am sure there is a simpler way to do it either using dplyr or using conditionals

something like

if(df[Segment = "Seg1"]) {
  df['Var1'] <- df['Var1'] / df[4, 3] &
    df['Var2'] <- df['Var2'] / df[4, 4]
} else if (df[Segment = "Seg2"]) {
  df['Var1'] <- df['Var1'] / df[7, 3] &
    df['Var2'] <- df['Var2'] / df[7, 4]
} else if ....

but as you can imagine this is also not optimal code-wise and I am using the position of the market that I checked manually instead of using code to ask R to find it

Maybe something with mutate or left_join?

Hope my question is clear Does anyone have any idea?



Solution 1:[1]

library(tidyverse)
segment <- c(rep_len("Seg1", 4), rep_len("Seg2", 4))
company <- c(rep_len(c("a", "b", "c", "market"), 8))
var1 <- c(100, 100, 200, 400, 150, 200, 200, 800)
var2 <- c(200, 222, 333, 4444, 555, 666, 777, 888)
df <- data_frame(segment, company, var1, var2)
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

df |> group_by(segment) |>
   mutate(new1 = var1/var1[company == "market"], new2 = var2/var2[company == "market"], )
#> # A tibble: 8 × 6
#> # Groups:   segment [2]
#>   segment company  var1  var2  new1   new2
#>   <chr>   <chr>   <dbl> <dbl> <dbl>  <dbl>
#> 1 Seg1    a         100   200 0.25  0.0450
#> 2 Seg1    b         100   222 0.25  0.0500
#> 3 Seg1    c         200   333 0.5   0.0749
#> 4 Seg1    market    400  4444 1     1     
#> 5 Seg2    a         150   555 0.188 0.625 
#> 6 Seg2    b         200   666 0.25  0.75  
#> 7 Seg2    c         200   777 0.25  0.875 
#> 8 Seg2    market    800   888 1     1

Created on 2022-01-25 by the reprex package (v2.0.1)

Solution 2:[2]

So I solved it like this

  1. created new "aux" df with only market values
market.df <- df %>% filter(Company == 'Market")
  1. then matched with left_join() (note that my market.df is much smaller than my original df so I can't just divide my original df by market.df)
new.df <- left_join(df, unique(market.aux), 
                    by == 'Segment', 
                    suffix = c("", ".market"))
  1. then just separated new.df into 2 data frames and divided them
aux.1 <- select(new.df, 'Variable 1', 'Variable 2')
aux.2 <- select(new.df, 'Variable 1.market', 'Variable 2.market')
results <- aux.1/aux.2
  1. then I just took back the first 2 columns of the original data frame to add Segment and Company again...

The complicated part of this was that not all segments had the same length so left_join and unique were very important for my solution to work

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Grzegorz Sapijaszko
Solution 2 benson23