'Is there a function with which I can find a combination of values in a data set, which has on average the largest values?

These are the instructions that are given:

In 2012 and prior to that time, a pizza chain in Australia, Eagle Boys (taken over by Pizza Hut in 2016), ran an advertising campaign in which several claims about the size of their pizzas as well as those of their main competitor, Domino’s, were made. The file pizza.csv contains the data on which they based their campaign. For each of the 250 pizzas taken into consideration, you are provided with the chain the pizza comes from, the type of crust, toppings and the diameter of the pizza (in cm).

The question I have to answer is the following:

What combination of crust type, toppings and chain the pizza comes from has on average the largest pizzas? What combination yields the smallest pizzas?

These are the plots I managed to create, but I'm still only comparing two columns

that's the correlating code:

par(mfrow = c(1, 2))
boxplot(dominos$Diameter ~ dominos$CrustDescription)
boxplot(dominos$Diameter ~ dominos$Topping)
r


Solution 1:[1]

You probably have data similar to this

dat
#   chain crust topping diameter
# 1     Y     B       M 27.10686
# 2     X     C       L 29.70423
# 3     Y     A       L 27.57106
# 4     Y     A       L 27.88939
# 5     X     A       M 29.61035
# 6     X     C       K 29.77217

First, boxplot has a formula interface you may want to use.

boxplot(diameter ~ crust + topping + chain, dat)

enter image description here

Second, the same formula can be used in the actually very important aggregate function, which allows you to apply any FUNction to aggregated data.

a <- aggregate(diameter ~ crust + topping + chain, dat, FUN=mean)

In the second step you want those diameters that equal the max and the min.

a[a$diameter == max(a$diameter), ]
#   crust topping chain diameter
# 3     C       K     X 28.21241

a[a$diameter == min(a$diameter), ]
#    crust topping chain diameter
# 18     C       M     Y  26.6717

Data:

n <- 250
dat <- expand.grid(chain=LETTERS[24:25], crust=LETTERS[1:3], topping=LETTERS[11:13])  
dat <- dat[rep(seq_len(nrow(dat)), n/2), ]
set.seed(42)
dat$diameter <- runif(nrow(dat), 25, 30)
dat <- dat[sample(seq_len(nrow(dat)), n), ] 

Solution 2:[2]

try this

library(tidyverse)

df<-read.csv("pizza.csv")
df %>% group_by(CrustDescription, Topping, Chain) %>%
summarize(avg = mean(Diameter))

hope it helps...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jay.sf
Solution 2 lucabiel