'Get values from aggregate method into NA values from other column condition in R
I was trying to fill NA values from column 'Income' with the mean of the column based on the gender column, so if 'M' has a NA value, insert one from the mean of all 'M' people (except NA values):
GENDER|INCOME
M| 1
M| 2
M| 3
M| NA
F| 4
F| 4
F| 5
F| NA
GENDER|INCOME
M| 1
M| 2
M| 3
M| 2
F| 4
F| 4
F| 5
F| 4.33
I tried the aggregate method but it gives you a list that, if I try to input in my dataset, it gives me an error in order to write a CSV. It works perfectly to get the values, but I don't know how to input these values but manually. This is my code so far:
df$INCOME[is.na(df$INCOME)] <- aggregate(.~df$GENDER, df, FUN = mean, na.rm = TRUE)
Thank you for any help!
Solution 1:[1]
If we want the mean to fill the missing values (NA) for each 'GENDER', then use na.aggregate from zoo which does this automatically (FUN by default is mean)
library(zoo)
library(dplyr)
df <- df %>%
group_by(GENDER) %>%
mutate(INCOME = na.aggregate(INCOME)) %>%
ungroup
Or another option is to replace the NA elements with the mean
df %>%
group_by(GENDER) %>%
mutate(INCOME = replace(INCOME, is.na(INCOME),
mean(INCOME, na.rm = TRUE))) %>%
ungroup
aggregate returns a data.frame of summarised value columns and the group column. Thus, assigning on the NA elements with the whole data.frame may not work i.e. we need to extract the column
df_mean <- aggregate(INCOME ~ GENDER, df, FUN = mean,
na.rm = TRUE, na.action = NULL)
df_mean
GENDER INCOME
1 F 4.333333
2 M 2.000000
df_mean$INCOME
[1] 4.333333 2.000000
df$INCOME[is.na(df$INCOME)] <- df_mean$INCOME
Extracting the INCOME column and assign could be buggy approach because
- the 'GENDER' grouping may be in a different order than in the original data
- the number of
NAfor each 'GENDER' could be potentially different
Therefore, an option is also to merge the summarised data and then change the values based on the NA elements
transform(merge(df, df_mean, by = "GENDER"),
INCOME = ifelse(is.na(INCOME.x), INCOME.y, INCOME.x))[c('GENDER', 'INCOME')]
GENDER INCOME
1 F 4.000000
2 F 4.000000
3 F 5.000000
4 F 4.333333
5 M 1.000000
6 M 2.000000
7 M 3.000000
8 M 2.000000
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
