'String operations with stringr not working depending on vectorized/unvectorized call
I'm struggling understanding why my code below works only when using rowwise in combination with ifelse. Or more precisely, I think I get why it is working in that scenario, but not why it doesn't simply work with if_else.
What I'm doing is, I'm checking if a certain rows contains the word "infile" or "outfile" and if it has a relative path (".."). If it does have the words "infile/outfile" and not a relative path, then it has an absolute path "C:". And in that case, I want to replace the user name with something else (here: "test").
Any ideas?
Data:
df <- structure(list(value = c("infile 'C:\\Users\\USER\\folder\\Data.sav'",
"infile '..\\folder\\Data.sav'", "outfile '..\\folder\\Data.sav'",
"test", "")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L))
user_name <- "test"
Code that works:
df |>
rowwise() |>
mutate(value = ifelse(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
str_replace(value,
str_sub(value,
str_locate_all(value, "\\\\")[[1]][2] + 1,
str_locate_all(value, "\\\\")[[1]][3] - 1),
user_name),
value)) |>
ungroup()
with output:
# A tibble: 5 × 1
value
<chr>
1 "infile 'C:\\Users\\test\\folder\\Data.sav'"
2 "infile '..\\folder\\Data.sav'"
3 "outfile '..\\folder\\Data.sav'"
4 "test"
5 ""
Code that doesn't work:
df |>
mutate(value = if_else(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
str_replace(value,
str_sub(value,
str_locate_all(value, "\\\\")[[1]][2] + 1,
str_locate_all(value, "\\\\")[[1]][3] - 1),
user_name),
value))
I think this works, but gives a warning message:
Warning messages:
1: Problem while computing `value = if_else(...)`.
ℹ empty search patterns are not supported
2: Problem while computing `value = if_else(...)`.
ℹ empty search patterns are not supported
Code that doesn't work:
df |>
rowwise() |>
mutate(value = if_else(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
str_replace(value,
str_sub(value,
str_locate_all(value, "\\\\")[[1]][2] + 1,
str_locate_all(value, "\\\\")[[1]][3] - 1),
user_name),
value)) |>
ungroup()
Error in `mutate()`:
! Problem while computing `value = if_else(...)`.
ℹ The error occurred in row 2.
Caused by error:
! Empty `pattern` not supported
Solution 1:[1]
Basically, the issue is that if_else() evaluates both the true and false output in every row, while ifelse() only evaluates the true and false expressions where they are used.
Also, if you don't use rowwise(), then mutate passes the whole set of strings in df$value on each iteration, and then returns the same indices for the beginning and ending of the string for each row.
To debug, I'd suggest breaking the calculation out a bit:
df %>% rowwise() %>%
mutate(n=length(value), slen=str_length(value),
l1=str_locate_all(value,"\\\\")[[1]][2]+1,
l2=str_locate_all(value,"\\\\")[[1]][3]-1,
ssub=str_sub(value, l1, l2),
detect=str_detect(value, "infile|outfile")& !str_detect(value,"\\'\\.\\.\\\\"),
vout=if_else(detect, ssub, user_name))
# A tibble: 5 × 8
# Rowwise:
value n slen l1 l2 ssub detect vout
<chr> <int> <int> <dbl> <dbl> <chr> <lgl> <chr>
1 "infile 'C:\\Users\\USER\\folder\\Data.sav'" 1 38 18 21 "USER" TRUE USER
2 "infile '..\\folder\\Data.sav'" 1 27 19 10 "" FALSE test
3 "outfile '..\\folder\\Data.sav'" 1 28 20 11 "" FALSE test
4 "test" 1 4 NA NA NA FALSE test
5 "" 1 0 NA NA NA FALSE test
While without the rowwise(), mutate gets all the strings in the value column all at once, and it finds the same locations for your cuts on every single row:
df %>%
mutate(n=length(value), slen=str_length(value),
l1=str_locate_all(value,"\\\\")[[1]][2]+1,
l2=str_locate_all(value,"\\\\")[[1]][3]-1,
ssub=str_sub(value, l1, l2),
detect=str_detect(value, "infile|outfile")& !str_detect(value,"\\'\\.\\.\\\\"),
vout=if_else(detect, ssub, user_name))
# A tibble: 5 × 8
value n slen l1 l2 ssub detect vout
<chr> <int> <int> <dbl> <dbl> <chr> <lgl> <chr>
1 "infile 'C:\\Users\\USER\\folder\\Data.sav'" 5 38 18 21 "USER" TRUE USER
2 "infile '..\\folder\\Data.sav'" 5 27 18 21 "\\Dat" FALSE test
3 "outfile '..\\folder\\Data.sav'" 5 28 18 21 "r\\Da" FALSE test
4 "test" 5 4 18 21 "" FALSE test
5 "" 5 0 18 21 "" FALSE test
Once you calculate the locations to subset your string incorrectly, I think you are just lucky that if_else threw a different error.
Solution 2:[2]
Here is one way (where my substitution of USER is very simple; not sure if it should be more generic):
df %>%
tidyr::separate(value, into = c('Type', 'Path'), sep = ' ') %>%
dplyr::mutate(
Value = dplyr::if_else(
(Type %in% c('infile', 'outfile')) & !startsWith(Path, "'.."),
stringr::str_replace(Path, 'USER', user_name),
Path
)
)
I split the value column to make the check easier.
If you need to replace the username with the variable you can do like this (here with back referencing the regular expression):
df %>%
tidyr::separate(value, into = c('Type', 'Path'), sep = ' ') %>%
dplyr::mutate(
Value = dplyr::if_else(
(Type %in% c('infile', 'outfile')) & !startsWith(Path, "'.."),
sub('^(C:\\\\Users\\\\)([[:alnum:]]+)\\\\', paste0('\\1', user_name, '\\\\'), Path),
Path
)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
