'Identify the correct hashtag indexes in tweet messages

I need to identify the correct indexes in twitter messages (various languages, emojis, etc).

I can't find a solution that returns these positions as shown in the example below.

import (
    "regexp"
    "testing"

    "github.com/stretchr/testify/require"
)

func TestA(t *testing.T) {
    text := "🇷🇺 [URGENT] Les forces de dissuasion #nucleaire de la #Russie"

    var re = regexp.MustCompile(`#([_A-Za-z0-9]+)`)

    pos := re.FindAllStringIndex(text, -1)

    // FindAllStringIndex returns
    // [0][43,53]
    // [1][60,67]

    // These are the expected positions.

    require.Equal(t, pos[0][0], 37) 
    require.Equal(t, pos[0][1], 47)

    require.Equal(t, pos[1][0], 54)
    require.Equal(t, pos[1][1], 61)
}


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source