'Replacing first occurrence line after first matched line
Let's assume the following XML file:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...
I need to replace the first </addresses> following the first <addresses xmlns="namespace"> by </namespace:addresses> so that the file becomes:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</namespace:addresses>
some other text
<addresses>
<something else/>
</addresses>
...
I am aware of this similar thread, but none of the following solution changes anything:
sed -e '/<addresses xmlns="namespace">/!b' -e ':a' -e "s/<\/namespace:addresses>/<\/addresses>/;t trail" -e 'n;ba' -e ':trail' -e 'n;btrail' file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/./ s/<\/namespace:addresses>/<\/addresses>/" file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/<\/namespace:addresses>/ s/<\/namespace:addresses>/<\/addresses>/" file.xml
For instance:
sed -e "/<addresses xmlns=\"namespace\">/,/./ s/<\/namespace:addresses>/<\/addresses>/" file.xml
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...
Maybe this issue is linked to the sed I'm using: 4.7-1ubuntu1 on impish/21.10 or even 4.8-1.
Any suggestion? I'm open to any other tool (perl/awk), the simpler, the better.
Solution 1:[1]
It is much easier with perl than with sed:
perl -0777 -i -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' file
See the online demo. Details:
<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>- the regex pattern matching<- a<char(addresses)- Group 1 ($1):addresses\s+- one or more whitespacesxmlns="namespace">- a fixed string[^<]*(?:<(?!/\1>)[^<]*)*- a much faster alternative to(?s:.)*?- basically, matches any text up to a</addresses>string\K- match reset operator that omits all text matched so far from the current match memory buffer</\1>- (this is what is finally consumed and will be replaced):</+ Group 1 value (so as not to repeataddresses) +>
</namespace:$1>- the replacement is</namespace:+ Group 1 value +>.
It replaces the first occurrence because the -0777 slurps the file into a single multiline text and there is no g flag.
Note the difference between \1 backreference syntax inside the pattern and $1 replacement backreference in the replacement pattern in perl command.
See the online demo:
s=' some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...'
perl -0777 -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' <<< "$s"
Output:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</namespace:addresses>
some other text
<addresses>
<something else/>
</addresses>
...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
