'R Group the Values for the respective variable and count the occurrence and give a percentage

I have a csv file which consists of 10 000 rows. I will show the top 12 rows which contained the values including the headers. I want to group the values for the respective variable and then count the number of time the value appear over the total count of values for the respective variable and give it percentage.

Variable        Values
Name            Jerome
Age             20
Sex             Male
Birth of Place  USA
Name            Alice
Age             22
Sex             Female
Birth of Place  Germany
Name            Brandon
Age             21
Sex             Male
Birth of Place  Africa

I want to achieve the output like this.

Variable       Values
Name           Jerome 33%
               Alice  33%
               Brandon 33%
Age            20      33%
               22      33%
               21      33%
Sex            Male    66%
               Female  33%
Birth of Place USA     33%
               Africa  33%
               Germany 33%

I have made a R code.

mydata <- read_csv('example.csv') read the csv file
View(mydata) view the whole content
View((unique(mydata)) Show the Unique variable and values
group_by(variable,values) group the variable and values and then perform count
table(newdata['values']) show the number of count the unique values under the column, Values.


Solution 1:[1]

library(tidyverse)

# replace this section with  read_csv("YOUR_FILE.CSV") %>%
read_csv("Variable,        Values
Name            ,Jerome
Age             ,20
Sex             ,Male
Birth of Place  ,USA
Name            ,Alice
Age             ,22
Sex             ,Female
Birth of Place  ,Germany
Name            ,Brandon
Age             ,21
Sex             ,Male
Birth of Place  ,Africa") %>%

  mutate(Variable = fct_inorder(Variable)) %>%
  group_by(Variable) %>%
  count(Values) %>%
  mutate(share = n / sum(n))

Result

   Variable       Values      n share
   <fct>          <chr>   <int> <dbl>
 1 Name           Alice       1 0.333
 2 Name           Brandon     1 0.333
 3 Name           Jerome      1 0.333
 4 Age            20          1 0.333
 5 Age            21          1 0.333
 6 Age            22          1 0.333
 7 Sex            Female      1 0.333
 8 Sex            Male        2 0.667
 9 Birth of Place Africa      1 0.333
10 Birth of Place Germany     1 0.333
11 Birth of Place USA         1 0.333

...or for more fidelity to the desired output:

...
mutate(Variable = if_else(row_number() == 1, as.character(Variable), ""),
         share = scales::percent(n / sum(n)))

   Variable         Values      n share
   <chr>            <chr>   <int> <chr>
 1 "Name"           Alice       1 33%  
 2 ""               Brandon     1 33%  
 3 ""               Jerome      1 33%  
 4 "Age"            20          1 33%  
 5 ""               21          1 33%  
 6 ""               22          1 33%  
 7 "Sex"            Female      1 33%  
 8 ""               Male        2 67%  
 9 "Birth of Place" Africa      1 33%  
10 ""               Germany     1 33%  
11 ""               USA         1 33% 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jon Spring