'Count distinct letters in a string? [duplicate]

How to count distinct characters in a string?

Simulated data

d = tibble(word = c("aaa", "abc", "abcde"))

enter image description here

How to code a new variable that counts the number of district letters in a string? In other words, this should give an answer as follows:

first row = 1
second row = 3
third row = 5

PS! Tidyverse solutions are especially welcome!



Solution 1:[1]

In base R,

sapply(strsplit(d$word, ''), function(x) length(unique(x)))
#[1] 1 3 5

The same logic can be written in tidyverse -

library(tidyverse)

d %>%
  mutate(unique_n = map_dbl(str_split(word, ''), n_distinct))

#  word  unique_n
#  <chr>    <dbl>
#1 aaa          1
#2 abc          3
#3 abcde        5

Solution 2:[2]

Here is a regex based approach:

x <- "abcabcabc"
output <- gsub("([a-z])(?=.*\\1)", "", x, perl=TRUE)  # "abc"
nchar(output)

[1] 3

The idea is to strip off all duplicate characters in the string, leaving behind a string containing only unique characters.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ronak Shah
Solution 2 Tim Biegeleisen