'Match words with hyphens and apostrophes

I have the following regex for matching words:

\w+(?:'|\-\w+)?

For the following string:

' 's yea' don't -yeah no- ice-cream '

it gives the following matches:

s yea' don't yeah no ice-cream

However, I would like the following matches:

's yea' don't yeah no ice-cream

Since a word can start or end with an apostrophe but not with a hyphen. Note the a ' on its own should not be matched.



Solution 1:[1]

Your \w+(?:'|\-\w+)? starts matching with a word character \w, thus all "words" starting with ' are not matched as per the requirements.

In general, you can match words with and without hyphens with

\w+(?:-\w+)*

In the current scenario, you may include the \w and ' into a character class and use

'?\w[\w']*(?:-\w+)*'?

See the regex demo

If a "word" can only have 1 hyphen, replace * at the end with the ? quantifier.

Breakdown:

  • '? - optional apostrophe
  • \w - a word character
  • [\w']* - 0+ word character or an apostrophe
  • (?:-\w+)* - 0+ sequences of:
    • - - a hyphen
    • \w+ - 1+ word character
  • '? - optional apostrophe

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1