'problem with `replace_na()` from tidyr package

I wrote a function that has five arguments to calculate random numbers from a normal distribution. It has two steps:

replace NA with 0 in tibble column
replace 0 with a random number

My problems are:

line three doesn't replace NA value with 0
line five doesn't replace 0 with a random number

I have this error :

! Must subset columns with a valid subscript vector.
x Subscript `col` has the wrong type `function`.
 It must be logical, numeric, or character.

here is my code :

whithout=function(col,min,max,mean,sd){
  for(i in 1:4267){
      continuous_dataset=continuous_dataset %>% replace_na(continuous_dataset[,col]=0)
      if(is.na(continuous_dataset[,col])){
         continuous_dataset[i,col]=round(rtruncnorm(1,min,max,mean,sd))    
    }
  }
}

r tidyverse tidyr

Solution 1:^[1]

There's no need to write a function that loops across both columns and observations.

I assume you have no zeroes in your dataset to begin with. In which case, I can skip replacing NA with 0 and go straight to genereating the replacement value.

My solution is based on the tidyverse.

First, generate some test data.

library(tidyverse)

set.seed(123)
df <- tibble(x=runif(5), y=runif(5), z=runif(5))
df$x[3] <- NA
df$y[4] <- NA
df$z[5] <- NA
df
# A tibble: 5 × 3
       x       y      z
   <dbl>   <dbl>  <dbl>
1  0.288  0.0456  0.957
2  0.788  0.528   0.453
3 NA      0.892   0.678
4  0.883 NA       0.573
5  0.940  0.457  NA

Now solve the problem.

df %>% 
  mutate(
    across(
      everything(), 
      function(.x, mean, sd) .x <- ifelse(is.na(.x), rnorm(nrow(.), mean, sd), .x), 
      mean=500, 
      sd=100
    )
  )
# A tibble: 5 × 3
        x        y       z
    <dbl>    <dbl>   <dbl>
1   0.288   0.0456   0.957
2   0.788   0.528    0.453
3 669.      0.892    0.678
4   0.883 629.       0.573
5   0.940   0.457  467.

By avoiding looping through columns and rows, the code is more compact, more robust and (though I've not tested) faster.

If you don't want to process every column, simply replace everything() with a vector of columns that you do want to process. For example

df %>% 
  mutate(
    across(
      c(x, y), 
      function(.x, mean, sd) .x <- ifelse(is.na(.x), rnorm(nrow(.), mean, sd), .x), 
      mean=500, 
      sd=100
    )
  )
# A tibble: 5 × 3
        x        y      z
    <dbl>    <dbl>  <dbl>
1   0.288   0.0456  0.957
2   0.788   0.528   0.453
3 669.      0.892   0.678
4   0.883 629.      0.573
5   0.940   0.457  NA

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Limey

'problem with `replace_na()` from tidyr package

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]