'Create word variations in R

I have an assigment in which I have absolutely no idea how to start to make it work.

I have to create variations of list of words, where will be replaced every character (between 1st and last) with '*' on different positions.

It should look something like this:

input: c('smog', 'sting')

desired output: 's*og', 'sm*g', 's**g', 's*ing', 'st*ng', 'sti*g', 's***g'

Any idea how to achieve something like this?

Thank you very much

UPDATE I've found this solution:

s <- c( 'smog')
f <- function(x,y) {substr(x,y,y) <- "*"; x}
g <- function(x) Reduce(f,x,s)
unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))

output:
[1] "s*og" "sm*g" "s**g"

the only problem with this is, that it works only when there is one word in the string, not several



Solution 1:[1]

See also this SO post for related techniques: Create all combinations of letter substitution in string

EDIT

From the OP edit and comment:

repfun2 <- function(s){
  f <- function(x,y) {substr(x,y,y) <- "*"; x}
  g <- function(x) Reduce(f,x,s)
  out <- unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))
  return(out)
}
lapply(test2, FUN = repfun2)

Ouput:

> lapply(test2, FUN = repfun2)
[[1]]
[1] "s*og" "sm*g" "s**g"

[[2]]
[1] "s*ing" "st*ng" "sti*g" "s**ng" "s*i*g" "st**g" "s***g"

Previous answer for random replacement

I understand you want a random replacement of characters in a vector of strings. If this is correct, here is an idea:

test2 <- c('smog', 'sting')

repfun <- function(.string) {
  n_char <- nchar(.string)
  # random selection of n characters that will be replaced in the string
  repchar <- sample(1:n_char, size = sample(1:n_char, size = 1))
  # replacing the characters in the string
  for(i in seq_along(repchar)) substring(.string, repchar[i], repchar[i]) <- "*"
  return(.string)
}
lapply(test2, FUN = repfun)

Some outputs:

> lapply(test2, FUN = repfun)
[[1]]
[1] "*mog"

[[2]]
[1] "s*ing"

> lapply(test2, FUN = repfun)
[[1]]
[1] "s*o*"

[[2]]
[1] "s*i*g"

The basic idea is:

  1. Determine the number of characters in a string,
  2. Randomly sample it based on its length,
  3. Replace the randomly sampled characters by "*"
  4. Use lapply to pass a vector of character strings.

I think you can improve it by removing the for loop if needed, see some ideas here and here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1