'Parse string to keep only Emoji 1.0 emojis
I am writing a backend for an older system that only supports displaying Emoji 1.0 emojis, other emojis display as grey squares, or can even cause crashes. The unicode specification listing these emojis can be found here. The system this backend is serving is from a third-party, so I cannot hope to change it.
I want to sanitise user inputs to strip all emojis from a string that are not in the emoji 1.0 specification listed above and will therefore not display correctly.
I am not sure how to filter by emoji except to use a lookup table or huge regex that lists the unicode character of every emoji 1.0 character. Additionally some emojis are composed of multiple characters which further complicates this problem.
I am not fussy about which technique is used, whether it involves a regex, some sort of library, a trie data structure.
Example:
before: 🧌😀🫘😻
after: 😀😻
The troll and beans emojis were removed because they are not part of Emoji 1.0, but the smiley face and cat heart eyes remain because they are in Emoji 1.0.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
