'String operations with stringr not working depending on vectorized/unvectorized call

I'm struggling understanding why my code below works only when using rowwise in combination with ifelse. Or more precisely, I think I get why it is working in that scenario, but not why it doesn't simply work with if_else.

What I'm doing is, I'm checking if a certain rows contains the word "infile" or "outfile" and if it has a relative path (".."). If it does have the words "infile/outfile" and not a relative path, then it has an absolute path "C:". And in that case, I want to replace the user name with something else (here: "test").

Any ideas?

Data:

df <- structure(list(value = c("infile 'C:\\Users\\USER\\folder\\Data.sav'", 
"infile '..\\folder\\Data.sav'", "outfile '..\\folder\\Data.sav'", 
"test", "")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L))

user_name <- "test"

Code that works:

df |> 
  rowwise() |> 
  mutate(value = ifelse(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
                        str_replace(value,
                                    str_sub(value,
                                            str_locate_all(value, "\\\\")[[1]][2] + 1,
                                            str_locate_all(value, "\\\\")[[1]][3] - 1),
                                    user_name),
                        value)) |> 
  ungroup()

with output:

# A tibble: 5 × 1
  value                                       
  <chr>                                       
1 "infile 'C:\\Users\\test\\folder\\Data.sav'"
2 "infile '..\\folder\\Data.sav'"             
3 "outfile '..\\folder\\Data.sav'"            
4 "test"                                      
5 ""   

Code that doesn't work:

df |> 
  mutate(value = if_else(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
                        str_replace(value,
                                    str_sub(value,
                                            str_locate_all(value, "\\\\")[[1]][2] + 1,
                                            str_locate_all(value, "\\\\")[[1]][3] - 1),
                                    user_name),
                        value))

I think this works, but gives a warning message:

Warning messages:
1: Problem while computing `value = if_else(...)`.
ℹ empty search patterns are not supported 
2: Problem while computing `value = if_else(...)`.
ℹ empty search patterns are not supported 

Code that doesn't work:

df |> 
  rowwise() |>
  mutate(value = if_else(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
                        str_replace(value,
                                    str_sub(value,
                                            str_locate_all(value, "\\\\")[[1]][2] + 1,
                                            str_locate_all(value, "\\\\")[[1]][3] - 1),
                                    user_name),
                        value)) |> 
  ungroup()

Error in `mutate()`:
! Problem while computing `value = if_else(...)`.
ℹ The error occurred in row 2.
Caused by error:
! Empty `pattern` not supported


Solution 1:[1]

Basically, the issue is that if_else() evaluates both the true and false output in every row, while ifelse() only evaluates the true and false expressions where they are used.

Also, if you don't use rowwise(), then mutate passes the whole set of strings in df$value on each iteration, and then returns the same indices for the beginning and ending of the string for each row.

To debug, I'd suggest breaking the calculation out a bit:

df %>% rowwise() %>%
       mutate(n=length(value), slen=str_length(value),
              l1=str_locate_all(value,"\\\\")[[1]][2]+1,
              l2=str_locate_all(value,"\\\\")[[1]][3]-1, 
              ssub=str_sub(value, l1, l2), 
              detect=str_detect(value, "infile|outfile")& !str_detect(value,"\\'\\.\\.\\\\"), 
              vout=if_else(detect, ssub, user_name))
# A tibble: 5 × 8
# Rowwise: 
  value                                            n  slen    l1    l2 ssub   detect vout 
  <chr>                                        <int> <int> <dbl> <dbl> <chr>  <lgl>  <chr>
1 "infile 'C:\\Users\\USER\\folder\\Data.sav'"     1    38    18    21 "USER" TRUE   USER 
2 "infile '..\\folder\\Data.sav'"                  1    27    19    10 ""     FALSE  test 
3 "outfile '..\\folder\\Data.sav'"                 1    28    20    11 ""     FALSE  test 
4 "test"                                           1     4    NA    NA  NA    FALSE  test 
5 ""                                               1     0    NA    NA  NA    FALSE  test 

While without the rowwise(), mutate gets all the strings in the value column all at once, and it finds the same locations for your cuts on every single row:

df %>% 
       mutate(n=length(value), slen=str_length(value),
              l1=str_locate_all(value,"\\\\")[[1]][2]+1,
              l2=str_locate_all(value,"\\\\")[[1]][3]-1, 
              ssub=str_sub(value, l1, l2), 
              detect=str_detect(value, "infile|outfile")& !str_detect(value,"\\'\\.\\.\\\\"), 
              vout=if_else(detect, ssub, user_name))
# A tibble: 5 × 8
  value                                            n  slen    l1    l2 ssub    detect vout 
  <chr>                                        <int> <int> <dbl> <dbl> <chr>   <lgl>  <chr>
1 "infile 'C:\\Users\\USER\\folder\\Data.sav'"     5    38    18    21 "USER"  TRUE   USER 
2 "infile '..\\folder\\Data.sav'"                  5    27    18    21 "\\Dat" FALSE  test 
3 "outfile '..\\folder\\Data.sav'"                 5    28    18    21 "r\\Da" FALSE  test 
4 "test"                                           5     4    18    21 ""      FALSE  test 
5 ""                                               5     0    18    21 ""      FALSE  test 

Once you calculate the locations to subset your string incorrectly, I think you are just lucky that if_else threw a different error.

Solution 2:[2]

Here is one way (where my substitution of USER is very simple; not sure if it should be more generic):

df %>% 
    tidyr::separate(value, into = c('Type', 'Path'), sep = ' ') %>% 
    dplyr::mutate(
        Value = dplyr::if_else(
            (Type %in% c('infile', 'outfile')) & !startsWith(Path, "'.."),
            stringr::str_replace(Path, 'USER', user_name),
            Path
        )
    )

I split the value column to make the check easier.

If you need to replace the username with the variable you can do like this (here with back referencing the regular expression):

df %>% 
    tidyr::separate(value, into = c('Type', 'Path'), sep = ' ') %>% 
    dplyr::mutate(
        Value = dplyr::if_else(
            (Type %in% c('infile', 'outfile')) & !startsWith(Path, "'.."),
            sub('^(C:\\\\Users\\\\)([[:alnum:]]+)\\\\', paste0('\\1', user_name, '\\\\'), Path),
            Path
        )
    )

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2