'Count distinct letters in a string? [duplicate]
How to count distinct characters in a string?
Simulated data
d = tibble(word = c("aaa", "abc", "abcde"))

How to code a new variable that counts the number of district letters in a string? In other words, this should give an answer as follows:
first row = 1
second row = 3
third row = 5
PS! Tidyverse solutions are especially welcome!
Solution 1:[1]
In base R,
sapply(strsplit(d$word, ''), function(x) length(unique(x)))
#[1] 1 3 5
The same logic can be written in tidyverse -
library(tidyverse)
d %>%
mutate(unique_n = map_dbl(str_split(word, ''), n_distinct))
# word unique_n
# <chr> <dbl>
#1 aaa 1
#2 abc 3
#3 abcde 5
Solution 2:[2]
Here is a regex based approach:
x <- "abcabcabc"
output <- gsub("([a-z])(?=.*\\1)", "", x, perl=TRUE) # "abc"
nchar(output)
[1] 3
The idea is to strip off all duplicate characters in the string, leaving behind a string containing only unique characters.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ronak Shah |
| Solution 2 | Tim Biegeleisen |
