'How do I calculate the mean of each categorical descriptor?

I have this dataset in which the career statistics are described of various different university degrees. These degrees are categorised using a broader area of study in a different column, for example the degree 'Actuarial Science' falls under the 'Business' category, 'Nursing' under the 'Health' etc. I wish to condense the 172 rows of degrees into the 16 major categories (such that my dataset is now just 16 rows) and use their mean scores for my analysis.

I'm aware this is probably a few functions in addition to the 'group_by()' _function from tidyverse but I'm unsure where to go after this. The head of the dataset is below. There's an additional 12 columns omitted here.

Rank Major              Total Men   Women Major_category ShareWomen Sample_size Employed 
1.   Petroleum Eng      2339  2057  282   Engineering    0.121      36          1976
2.   Mining             756   679   77    Engineering    0.102      7           640
3.   Metallurgic Eng.   856   725   131   Engineering    0.153      3           648
4.   Naval Architecture 1258  1123  135   Engineering    0.107      16          758
5.   Chemical Eng.      32260 21239 11031 Engineering    0.342      289         25694 
6.   Nuclear Eng.       2573  2200  373   Engineering    0.145      17          1857
7.   Studio Arts        16977 4754  12223 Arts           0.7199     182         13908

r dplyr tidyverse

Solution 1:^[1]

Simply try this, have added more variables which might be of interest to you. Modify as needed and not copied:

    yourDf%>%
    group_by(Major_category)%>%
    summarise( Mean_Score = mean(Variable_to_average,na.rm=T),
              ,Counts_Major =n_distint(Major) # will give number of categories  
              ,Men = sum(Men,na.rm=T)         # total Men / women     
)

Hope you got the gist to analyse other columns. Summarise is very powerful.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	anuanand

'How do I calculate the mean of each categorical descriptor?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]