'grep between some characters (quotes, etc) of after (eg. hashtag) any content (text, numbers, emojis) [duplicate]

Based on this question: Bash sed - find hashtags in string; with no solutions for this case (when you have special characters).

This question is well-researched and not a duplicate of this unrelated question as the referred doesn't covers all the asked topics (support to special characters and numbers; grep both between and after/before).

echo "Text and #hashtag" | grep -o '#[[:alpha:]]\+*' | tr -d '"' works successfully, returning #hashtag; that's still related to the mentioned question...

...About this new question with mine own needs (that can be useful to you), this is my version, parsing text between doublequotes instead of after hashtag:

echo '#first = "Yes"' | grep -o '"[[:alpha:]]\+*"' | tr -d '"' and it works, returning Yes.

However, when it have an emoji or other characters such as > and / (example: echo '#first = "✅ Yes"' | grep -o '"[[:alpha:]]\+*"' | tr -d '"') it returns an empty output.

It have to support any kind of character (emojis, html tags, numbers).

This should be useful not only for parsing between characters, but also after a character (such as parsing any #hashtag text) or before.



Solution 1:[1]

The way to extract text between double quotes is to match any character except double quote, as many as possible, between double quotes.

grep -o '"[^"]*"' | tr -d '"'

Some test cases:

grep -o '"[^"]*"' <<\___here | tr -d '"'
there is "text" between "double quotes"
just one "?" here, "test me!"
any unpaired double quote " will not match 
___here

The second one of these will fail with the current code in your own answer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1