'In scala, how to replace emoji symbol but remain other language words

I have a string like this: bat★☆😂 😆 ⛱󰀄󰀄 ✨🚣‍♂️⛷🏂❤️🤍🪵֎۩ᴥ★ Lôa Créole♥ Now, I need to replace all of emoji symbol to empty string but I also need to remain ô and é. I checked from internet to use like this:

regexp_replace(df("word"), """[^ 'a-zA-Z0-9,.?!]""","")

But this method also covered ô and é. Would you please help to guide how to exclude the ô and é, only emoji symbol

scala>     val df = Seq(
     |       (8, "bat★☆😂 😆 ⛱󰀄󰀄 ✨🚣♂⛷🏂❤🤍🪵֎۩ᴥ★ Lôa Créole♥"),
     |       (64, "bb")
     |     ).toDF("number", "word")
df: org.apache.spark.sql.DataFrame = [number: int, word: string]

scala> df.select($"number", $"word", regexp_replace(df("word"), """[^ 'a-zA-Z0-9,.?!]""","").alias("word_revised")).show(false)
+------+------------------------------------------------+---------------+
|number|word                                            |word_revised   |
+------+------------------------------------------------+---------------+
|8     |bat★☆😂 😆 ⛱󰀄󰀄 ✨🚣‍♂️⛷🏂❤️🤍🪵֎۩ᴥ★ Lôa Créole♥|bat    La Crole|
|64    |bb                                              |bb             |
+------+------------------------------------------------+---------------+



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source