'Delete a word from a string which contains hashtags

I have already done a lot of "filtering" with regexp to remove unwanted characters from a string, this is what i am using:

var regexpHashtag = new RegExp(/(?:^|\s)(?:#)([a-zA-Z\d]+)/g)
var regexpUrl = new RegExp(/(?:https?|ftp):\/\/[\n\S]+/g)
var regexpEmoji = new RegExp(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g)
var regexpQuotes = new RegExp(/['"]+/g)

tweetText = tweetText.replace(regexpHashtag, '')
tweetText = tweetText.replace(regexpUrl, '')
tweetText = tweetText.replace(regexpEmoji, '')
tweetText = tweetText.replace(regexpQuotes, '')

but still there are cases where hashtag persists, for example before filtering:

Pogledajte prizore koje je naš fotograf danas zabilježio na Ilidži (FOTO)  📸☀️☀️☀️#Setnja #Ilidza #Malaaleja

after:

Pogledajte prizore koje je naš fotograf danas zabilježio na Ilidži (FOTO)  ️️️#Setnja

"#Setnja" this word is what is causing my problem, is it because there are emoji symbols before a word because these hashtags "#Ilidza #Malaaleja" are removed. How can i improve my regexp to delete this word? Thanks.



Solution 1:[1]

Your logic admits that a hashtag may be preceded by some character, so remove the whitespace boundary check on the LHS:

var regexpHashtag = new RegExp(/#[a-zA-Z\d]+/g)
var regexpUrl = new RegExp(/(?:https?|ftp):\/\/[\n\S]+/g)
var regexpEmoji = new RegExp(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g)
var regexpQuotes = new RegExp(/['"]+/g)

tweetText = "Pogledajte prizore koje je naš fotograf danas zabilježio na Ilidži (FOTO)  ???????#Setnja #Ilidza #Malaaleja";
tweetText = tweetText.replace(regexpHashtag, '')
tweetText = tweetText.replace(regexpUrl, '')
tweetText = tweetText.replace(regexpEmoji, '')
tweetText = tweetText.replace(regexpQuotes, '')

console.log(tweetText);

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Biegeleisen