'R Group the Values for the respective variable and count the occurrence and give a percentage
I have a csv file which consists of 10 000 rows. I will show the top 12 rows which contained the values including the headers. I want to group the values for the respective variable and then count the number of time the value appear over the total count of values for the respective variable and give it percentage.
Variable Values
Name Jerome
Age 20
Sex Male
Birth of Place USA
Name Alice
Age 22
Sex Female
Birth of Place Germany
Name Brandon
Age 21
Sex Male
Birth of Place Africa
I want to achieve the output like this.
Variable Values
Name Jerome 33%
Alice 33%
Brandon 33%
Age 20 33%
22 33%
21 33%
Sex Male 66%
Female 33%
Birth of Place USA 33%
Africa 33%
Germany 33%
I have made a R code.
mydata <- read_csv('example.csv') read the csv file
View(mydata) view the whole content
View((unique(mydata)) Show the Unique variable and values
group_by(variable,values) group the variable and values and then perform count
table(newdata['values']) show the number of count the unique values under the column, Values.
Solution 1:[1]
library(tidyverse)
# replace this section with read_csv("YOUR_FILE.CSV") %>%
read_csv("Variable, Values
Name ,Jerome
Age ,20
Sex ,Male
Birth of Place ,USA
Name ,Alice
Age ,22
Sex ,Female
Birth of Place ,Germany
Name ,Brandon
Age ,21
Sex ,Male
Birth of Place ,Africa") %>%
mutate(Variable = fct_inorder(Variable)) %>%
group_by(Variable) %>%
count(Values) %>%
mutate(share = n / sum(n))
Result
Variable Values n share
<fct> <chr> <int> <dbl>
1 Name Alice 1 0.333
2 Name Brandon 1 0.333
3 Name Jerome 1 0.333
4 Age 20 1 0.333
5 Age 21 1 0.333
6 Age 22 1 0.333
7 Sex Female 1 0.333
8 Sex Male 2 0.667
9 Birth of Place Africa 1 0.333
10 Birth of Place Germany 1 0.333
11 Birth of Place USA 1 0.333
...or for more fidelity to the desired output:
...
mutate(Variable = if_else(row_number() == 1, as.character(Variable), ""),
share = scales::percent(n / sum(n)))
Variable Values n share
<chr> <chr> <int> <chr>
1 "Name" Alice 1 33%
2 "" Brandon 1 33%
3 "" Jerome 1 33%
4 "Age" 20 1 33%
5 "" 21 1 33%
6 "" 22 1 33%
7 "Sex" Female 1 33%
8 "" Male 2 67%
9 "Birth of Place" Africa 1 33%
10 "" Germany 1 33%
11 "" USA 1 33%
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jon Spring |
