'Identify the correct hashtag indexes in tweet messages
I need to identify the correct indexes in twitter messages (various languages, emojis, etc).
I can't find a solution that returns these positions as shown in the example below.
import (
"regexp"
"testing"
"github.com/stretchr/testify/require"
)
func TestA(t *testing.T) {
text := "🇷🇺 [URGENT] Les forces de dissuasion #nucleaire de la #Russie"
var re = regexp.MustCompile(`#([_A-Za-z0-9]+)`)
pos := re.FindAllStringIndex(text, -1)
// FindAllStringIndex returns
// [0][43,53]
// [1][60,67]
// These are the expected positions.
require.Equal(t, pos[0][0], 37)
require.Equal(t, pos[0][1], 47)
require.Equal(t, pos[1][0], 54)
require.Equal(t, pos[1][1], 61)
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
