'Regular expression doesn't match string when using MAWK

I have defined a regular expression which matches a list of words separated by one or more spaces where one of the words is followed by an asterisk. The strange thing is that for a list with only one word the expression doesn't match when I use mawk but it matches when I use gawk and nawk:

$ echo 'a*' | mawk '/([a-z]+ *)*[a-z]+ *[*]( *[a-z]+)*/'
$ echo 'a*' | gawk '/([a-z]+ *)*[a-z]+ *[*]( *[a-z]+)*/'
a*
$ echo 'a*' | nawk '/([a-z]+ *)*[a-z]+ *[*]( *[a-z]+)*/'
a*

If the word with the asterisk is followed by one or more words then the regular expression matches also when using mawk:

$ echo 'a* b' | mawk '/([a-z]+ *)*[a-z]+ *[*]( *[a-z]+)*/'
a* b

Any clues?

In Debian 11, mawk is the default implementation of AWK.

$ mawk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       srandom/random
regex-funcs:        internal
compiled limits:
sprintf buffer      8192
maximum-integer     2147483647


Solution 1:[1]

no idea how to fix mawk-1 ittself, but if you want regex to circumvent its shortcomings :

echo 'a*' | mawk '/([a-z]+ *)*[a-z]+ *[*](( *[a-z]+)*)?/'

    a*

but since all the stuff before or prior are merely "zero or more", and you need the whole line anyway, then why not just

echo 'a*' | mawk '/[a-z]+([ ]+)?[*]/' 

a*

or even more minimalistically,

echo 'a*' | mawk '/[a-z] +?\*/'  

a*

if you want a tight criteria around it, then maybe

echo 'a*' | mawk '/([a-z]+ +?)+[*]( +?[a-z]+)+?/'

a*

using [...]+? instead of [...]* sometimes may be more friendly with regex engines.

most modern regex engines shouldn't have any issues with [...]* , but this would be one of those scenarios where the slightly less intuitive appearing syntax provides meaningful assistance to the engine

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 RARE Kpop Manifesto