'Only get alphanumeric characters in capture group using sed
Input:
x.y={aaa b .c}
Note that the the content within {} are only an example, in reality it could be any value.
Problem: I would like to keep only the alphanumeric characters within the {}.
So it would be come:
x.y={aaabbc}
Trial 0
$ echo 'x.y={aaa b .c}' | sed 's/[^[:alnum:]]\+//g'
xyaaabc
This is great, but I'd like to only modify the part within {}. So I thought this may need capture groups, hence I went ahead and tried these:
Trial 1
$ echo 'x.y={aaa b .c}' | sed -E 's/x.y=\{(.*)\}/x.y={\1}/'
x.y={aaa b .c}
Here I have captured the content I want to modify (aaa b .c) correctly, but I need a way to somehow do s/[^[:alnum:]]\+//g only on \1.
Instead, I tried capturing all alphanumeric characters only (to \1) like this:
Trial 2
$ echo 'x.y={aaa b .c}' | sed -E 's/x.y=\{([[:alnum:]]+)\}/x.y={\1}/'
x.y={aaa b .c}
Of course, it doesn't work because I'm only expecting alnum's and then immediately a } literal. I didn't tell it to ignore the non-alnum's. I.e, this part:
s/x.y=\{([[:alnum:]]+)\}/x.y={\1}/
^^^^^^^^^^^^^^^^^^
It literally matches: an open brace, some alnum's, and a closing brace -- which is not what I want. I'd like it to match everything, but only capture the alnum's.
Example of input/output:
x.y={aaa b .c} blah
blah
x.y={1 2 3 def} blah
blah
to
x.y={aaabc} blah
blah
x.y={123def} blah
blah
I searched the web before finally giving up and posting the question but I didn't find anything helpful as I didn't see anyone with a similar problem as mine. Would appreciate some help this as I'd love to have a better understanding of variables in regex/sed, thanks!
Solution 1:[1]
With sed (tested on GNU sed, syntax may vary for other implementations):
$ sed -E ':a s/(\{[[:alnum:]]*)[^[:alnum:]]+([^}]*})/\1\2/; ta' ip.txt
x.y={aaabc} blah
blah
x.y={123def} blah
blah
:amarks that location as labela(used to jump usingtaas long as the substitution succeeds)(\{[[:alnum:]]*)matches{followed by zero or more alnum characaters[^[:alnum:]]+matches one or more non-alnum characters([^}]*})matches till the next}character
If perl is okay:
$ perl -pe 's/\{\K[^}]+(?=\})/$&=~s|[^a-z\d]+||gir/e' ip.txt
x.y={aaabc} blah
blah
x.y={123def} blah
blah
\{\K[^}]+(?=\})match sequence of{to}(assuming}cannot occur in between)\{\Kand(?=\})are used to avoid the braces from being part of the matched portion
eflag allows you to use Perl code in replacement portion, in this case another substitute command$&=~s|[^a-z\d]+||girhere,$&refers to entire matched portion,giflags are used for global/case-insensitive andrflag is used to return the value of this substitution instead of modifying$&[^a-z\d]+matches non-alphanumeric characters (assuming ASCII, you can also use[^[:alnum:]]+)- use
\W+if you want to preserve underscores as well
For both solutions, you can add x\.y= prefix if needed to narrow the scope of matching.
Solution 2:[2]
Here is another gnu-awk solution using FPAT:
s='x.y={aaa b .c}'
awk -v OFS= -v FPAT='{[^}]+}|[^{}]+' '
{
for (i=1; i<=NF; ++i)
if ($i ~ /^{/) $i = "{" gensub(/[^[:alnum:]]+/, "", "g", $i) "}"
} 1' <<< "$s"
x.y={aaabc}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | anubhava |
