'Get the average of the values of one column for the values in another

I was not so sure how to ask this question. i am trying to answer what is the average tone when an initiative is mentioned and additionally when a topic, and a goal( or achievement) are mentioned. My dataframe (df) has many mentions of 70 initiatives (rows). meaning my df has 500+ rows of data, but only 70 Initiatives.

My data looks like this

> tabmean
    Initiative Topic Goals Achievements Tone
1           52    44     2            2    2
2          294    42     2            2    2
3          103    31     2            2    2
4           52    41     2            2    2
5           87    26     2            1    1
6           52    87     2            2    2
7          136    81     2            2    2
8           19     7     2            2    1
9           19     4     2            2    2
10           0    63     2            2    2
11           0    25     2            2    2
12          19    51     2            2    2
13          52    51     2            2    2
14         108    94     2            2    1
15          52    89     2            2    2
16         110    37     2            2    2
17         247    25     2            2    2
18          66    95     2            2    2
19          24    49     2            2    2
20          24   110     2            2    2 

I want to find what is the mean or average Tone when an Initiative is mentioned. as well as what is the Tone when an Initiative, a Topic and a Goal are mentioned at the same time. The code options for Tone are : positive(coded: 1), neutral(2), negative (coded:3), and both positive and negative(4). Goals and Achievements are coded yes(1) and no(2).

I have used this code:

GoalMeanTone <- tabmean %>%
  group_by(Initiative,Topic,Goals,Tone) %>%
  summarize(averagetone = mean(Tone))

With Solution output :

GoalMeanTone 
# A tibble: 454 x 5
# Groups:   Initiative, Topic, Goals [424]
   Initiative Topic Goals Tone  averagetone
   <chr>      <chr> <chr> <chr>       <dbl>
 1 0          104   2     0              NA
 2 0          105   2     0              NA
 3 0          22    2     0              NA
 4 0          25    2     0              NA
 5 0          29    2     0              NA
 6 0          30    2     1              NA
 7 0          31    1     1              NA
 8 0          42    1     0              NA
 9 0          44    2     0              NA
10 0          44    NA    0              NA
# ... with 444 more rows

note that for Initiative Value 0 means "other initiative".

and I've also tried this code

library(plyr)
GoalMeanTone2 <- ddply( tabmean, .(Initiative), function(x) mean(tabmean$Tone) )

with solution output

> GoalMeanTone2
   Initiative V1
1           0 NA
2           1 NA
3         101 NA
4         102 NA
5         103 NA
6         104 NA
7         105 NA
8         107 NA
9         108 NA
10        110 NA

Note that in both instances, I do not get an average for Tone but instead get NA's

I have removed the NAs in the df from the column "Tone" also have tried to remove all the other mission values in the df ( its only about 30 values that i deleted). and I have also re-coded the values for Tone :

tabmean<-Meantable %>% mutate(Tone=recode(Tone, 
                                            `1`="1",
                                            `2`="0",
                                            `3`="-1",
                                            `4`="2"))

I still cannot manage to get the average tone for an initiative. Maybe the solution is more obvious than i think, but have gotten stuck and have no idea how to proceed or solve this.

i'd be super grateful for a better code to get this. Thanks!



Solution 1:[1]

I'm not completely sure what you mean by 'the average tone when an initiative is mentioned', but let's say that you'd want to get the average tone for when initiative=1, you could try the following:

tabmean %>% filter(initiative==1) %>% summarise(avg_tone=mean(tone, na.rm=TRUE)

Note that (1) you have to add na.rm==TRUE to the summarise call if you have missing values in the column that you are summarizing, otherwise it will only produce NA's, and (2) check that the columns are of type numeric (you could check that with str(tabmean) and for example change tone to numeric with tabmean <- tabmean %>% mutate(tone=as.numeric(tone)).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tcvdb1992