'How to add space between Devanagari and English in bash script?

I have a text file like so,

#greenऔर
<सेमीकोलन>
actionएक्शनmysql
admin2को

The expected output is,

#green और
< सेमीकोलन >
action एक्शन mysql
admin2 को

This is what I have tried to do so far, sed 's/[अ-ह].*/ &/g' testfile but the output that I am getting is like this,

#green और
< सेमीकोलन>
action एक्शनmysql
admin2 को

Is there anyway that it could be achieved using awk or sed to get the expected output?



Solution 1:[1]

You can use Perl here:

perl -i -CSD -Mutf8 -pe 's/(?<=[?-?\p{M}])(?=[^?-?\p{M}])|(?<=[^?-?\p{M}])(?=[?-?])/ /g' filename

See the regex demo. See the online demo:

#!/bin/bash
s='#green??
<????????>
action?????mysql
admin2??'
perl -CSD -Mutf8 -pe 's/(?<=[?-?\p{M}])(?=[^?-?\p{M}])|(?<=[^?-?\p{M}])(?=[?-?])/ /g' <<< "$s"

Output:

#green ?? 
< ???????? >
action ????? mysql
admin2 ?? 

The regex matches

  • (?<=[?-?\p{M}])(?=[^?-?\p{M}]) - a location between a Devanagari letter from the [?-?] range or a diacritic mark (\p{M}) and a char other than the Devanagari letter and a diacritic mark
  • | - or
  • (?<=[^?-?\p{M}])(?=[?-?]) - a location between a char other than the Devanagari letter and a diacritic mark and a Devanagari letter or a diacritic mark.

Solution 2:[2]

The .* matches the entire remainder of the line, and renders the g flag useless. Assuming the character class is correct (sorry, I'm unfamiliar with Devanagari) you could use

sed 's/[?-?]\+/ & /g' testfile

though you'll probably end up with some extra spaces you'll want to remove.

sed 's/[?-?]\+/ &/g;
    s/^ //;s/ $//;s/  / /g' testfile

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew
Solution 2 tripleee