'Get values from aggregate method into NA values from other column condition in R

I was trying to fill NA values from column 'Income' with the mean of the column based on the gender column, so if 'M' has a NA value, insert one from the mean of all 'M' people (except NA values):

    GENDER|INCOME
         M|     1
         M|     2
         M|     3
         M|    NA
         F|     4
         F|     4
         F|     5
         F|    NA


    GENDER|INCOME
         M|     1
         M|     2
         M|     3
         M|     2
         F|     4
         F|     4
         F|     5
         F|  4.33

I tried the aggregate method but it gives you a list that, if I try to input in my dataset, it gives me an error in order to write a CSV. It works perfectly to get the values, but I don't know how to input these values but manually. This is my code so far:

df$INCOME[is.na(df$INCOME)] <- aggregate(.~df$GENDER, df, FUN = mean, na.rm = TRUE)

Thank you for any help!

r aggregate na

Solution 1:^[1]

If we want the mean to fill the missing values (NA) for each 'GENDER', then use na.aggregate from zoo which does this automatically (FUN by default is mean)

library(zoo)
library(dplyr)
df <- df %>% 
  group_by(GENDER) %>% 
  mutate(INCOME = na.aggregate(INCOME)) %>%
  ungroup

Or another option is to replace the NA elements with the mean

df %>%
   group_by(GENDER) %>%
   mutate(INCOME = replace(INCOME, is.na(INCOME), 
        mean(INCOME, na.rm = TRUE))) %>%
 ungroup

aggregate returns a data.frame of summarised value columns and the group column. Thus, assigning on the NA elements with the whole data.frame may not work i.e. we need to extract the column

df_mean <- aggregate(INCOME ~ GENDER, df, FUN = mean,
   na.rm = TRUE, na.action = NULL)
df_mean
   GENDER   INCOME
1      F 4.333333
2      M 2.000000
df_mean$INCOME
[1] 4.333333 2.000000
df$INCOME[is.na(df$INCOME)] <- df_mean$INCOME

Extracting the INCOME column and assign could be buggy approach because

the 'GENDER' grouping may be in a different order than in the original data
the number of NA for each 'GENDER' could be potentially different

Therefore, an option is also to merge the summarised data and then change the values based on the NA elements

transform(merge(df, df_mean, by = "GENDER"), 
   INCOME = ifelse(is.na(INCOME.x), INCOME.y, INCOME.x))[c('GENDER', 'INCOME')]
  GENDER   INCOME
1      F 4.000000
2      F 4.000000
3      F 5.000000
4      F 4.333333
5      M 1.000000
6      M 2.000000
7      M 3.000000
8      M 2.000000

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Get values from aggregate method into NA values from other column condition in R

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]