'obtaining the percentage of a repeated non zero values
my data is like this
df<-structure(list(team_3_F = c("browingal ", "browingal ", "browingal ",
"browingal ", "browingal ", "browingal ", "browingal ", "browingal ",
"browingal ", "browingal ", "browingal ", "browingal ", "newyorkish",
"newyorkish", "newyorkish", "newyorkish", "site", "site", "site",
"site", "site", "site", "team ", "team ", "team ", "team ", "team ",
"team ", "team ", "team ", "team ", "team ", "team ", "team ",
"team ", "team ", "team ", "team ", "team ", "team ", "team ",
"team ", "team ", "team "), AAA_US = c(0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 88L, 5L, 11L, 1L, 0L, 0L, 0L, 45L, 0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 19L), BBB_US = c(0L, 2L, 3L, 2L, 1L,
0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 3L, 0L, 0L, 8L, 0L, 0L, 0L, 0L,
0L, 0L, 4L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 45L, 0L, 0L, 0L, 18L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 19L), CCC_US = c(0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 88L, 5L, 2L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 19L)), class = "data.frame", row.names = c(NA,
-44L))
I want to obtain the the percentage of each combinations in regards to each category for instance
AAA_BBB_US AAA_CCC_US
2 1 12 browingal
2 2 4 newyorkish
0 0 6 site
4 2 22 team
which means it will be the following percentage
AAA_BBB_US AAA_CCC_US
2/12*100 1/12*100
2/4*100 2/4*100
0/6*100 0/6*100
4/22*100 2/22*100
so the output will be like this
AAA_BBB_US AAA_CCC_US
16% 8.3%
50% 50%
0% 0%
18% 9%
Solution 1:[1]
You can create your AAA_BBB_US, AAA_CCC_US and AAA_BBB_CCC_US columns as below (i.e. will be TRUE if the product is non-zero, then, by team sum the values, dividing by the number of rows (n()) in each group
library(dplyr)
df %>%
mutate(AAA_BBB_US = AAA_US*BBB_US!=0,
AAA_CCC_US = AAA_US*CCC_US!=0,
AAA_BBB_CCC_US = AAA_US*BBB_US*CCC_US!=0)%>%
group_by(team_3_F) %>%
summarize(across(AAA_BBB_US:AAA_BBB_CCC_US, ~sum(.x)/n()))
Output:
# A tibble: 4 x 4
team_3_F AAA_BBB_US AAA_CCC_US AAA_BBB_CCC_US
<chr> <dbl> <dbl> <dbl>
1 "browingal " 0.167 0.0833 0.0833
2 "newyorkish" 0.25 1 0.25
3 "site" 0 0 0
4 "team " 0.182 0.0909 0.0909
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
