'List of unique characters of a dataset
I have a dataset in a dataframe and I want to see the total number of characters and the list of unique characters.
As for the total number of characters I have implemented the following code which seems is working well
df["Preprocessed_Text"].str.len().sum()
Could you please let me know how to get a list with the unique characters (not including the space)?
Solution 1:[1]
unichars = list(''.join(df["Preprocessed_Text"]))
print(sorted(set(unichars), key=unichars.index))
Solution 2:[2]
unique = list(set([letter for letter in ''.join(df['Processed_text'].values) if letter != " "]))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | user12936462 |
