'How do I calculate the mean of each categorical descriptor?
I have this dataset in which the career statistics are described of various different university degrees. These degrees are categorised using a broader area of study in a different column, for example the degree 'Actuarial Science' falls under the 'Business' category, 'Nursing' under the 'Health' etc. I wish to condense the 172 rows of degrees into the 16 major categories (such that my dataset is now just 16 rows) and use their mean scores for my analysis.
I'm aware this is probably a few functions in addition to the 'group_by()' _function from tidyverse but I'm unsure where to go after this. The head of the dataset is below. There's an additional 12 columns omitted here.
Rank Major Total Men Women Major_category ShareWomen Sample_size Employed
1. Petroleum Eng 2339 2057 282 Engineering 0.121 36 1976
2. Mining 756 679 77 Engineering 0.102 7 640
3. Metallurgic Eng. 856 725 131 Engineering 0.153 3 648
4. Naval Architecture 1258 1123 135 Engineering 0.107 16 758
5. Chemical Eng. 32260 21239 11031 Engineering 0.342 289 25694
6. Nuclear Eng. 2573 2200 373 Engineering 0.145 17 1857
7. Studio Arts 16977 4754 12223 Arts 0.7199 182 13908
Solution 1:[1]
Simply try this, have added more variables which might be of interest to you. Modify as needed and not copied:
yourDf%>%
group_by(Major_category)%>%
summarise( Mean_Score = mean(Variable_to_average,na.rm=T),
,Counts_Major =n_distint(Major) # will give number of categories
,Men = sum(Men,na.rm=T) # total Men / women
)
Hope you got the gist to analyse other columns. Summarise is very powerful.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | anuanand |
