'Create word variations in R
I have an assigment in which I have absolutely no idea how to start to make it work.
I have to create variations of list of words, where will be replaced every character (between 1st and last) with '*' on different positions.
It should look something like this:
input: c('smog', 'sting')
desired output: 's*og', 'sm*g', 's**g', 's*ing', 'st*ng', 'sti*g', 's***g'
Any idea how to achieve something like this?
Thank you very much
UPDATE I've found this solution:
s <- c( 'smog')
f <- function(x,y) {substr(x,y,y) <- "*"; x}
g <- function(x) Reduce(f,x,s)
unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))
output:
[1] "s*og" "sm*g" "s**g"
the only problem with this is, that it works only when there is one word in the string, not several
Solution 1:[1]
See also this SO post for related techniques: Create all combinations of letter substitution in string
EDIT
From the OP edit and comment:
repfun2 <- function(s){
f <- function(x,y) {substr(x,y,y) <- "*"; x}
g <- function(x) Reduce(f,x,s)
out <- unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))
return(out)
}
lapply(test2, FUN = repfun2)
Ouput:
> lapply(test2, FUN = repfun2)
[[1]]
[1] "s*og" "sm*g" "s**g"
[[2]]
[1] "s*ing" "st*ng" "sti*g" "s**ng" "s*i*g" "st**g" "s***g"
Previous answer for random replacement
I understand you want a random replacement of characters in a vector of strings. If this is correct, here is an idea:
test2 <- c('smog', 'sting')
repfun <- function(.string) {
n_char <- nchar(.string)
# random selection of n characters that will be replaced in the string
repchar <- sample(1:n_char, size = sample(1:n_char, size = 1))
# replacing the characters in the string
for(i in seq_along(repchar)) substring(.string, repchar[i], repchar[i]) <- "*"
return(.string)
}
lapply(test2, FUN = repfun)
Some outputs:
> lapply(test2, FUN = repfun)
[[1]]
[1] "*mog"
[[2]]
[1] "s*ing"
> lapply(test2, FUN = repfun)
[[1]]
[1] "s*o*"
[[2]]
[1] "s*i*g"
The basic idea is:
- Determine the number of characters in a string,
- Randomly sample it based on its length,
- Replace the randomly sampled characters by "*"
- Use
lapplyto pass a vector of character strings.
I think you can improve it by removing the for loop if needed, see some ideas here and here
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
