'How to add space between Devanagari and English in bash script?
I have a text file like so,
#greenऔर
<सेमीकोलन>
actionएक्शनmysql
admin2को
The expected output is,
#green और
< सेमीकोलन >
action एक्शन mysql
admin2 को
This is what I have tried to do so far, sed 's/[अ-ह].*/ &/g' testfile but the output that I am getting is like this,
#green और
< सेमीकोलन>
action एक्शनmysql
admin2 को
Is there anyway that it could be achieved using awk or sed to get the expected output?
Solution 1:[1]
You can use Perl here:
perl -i -CSD -Mutf8 -pe 's/(?<=[?-?\p{M}])(?=[^?-?\p{M}])|(?<=[^?-?\p{M}])(?=[?-?])/ /g' filename
See the regex demo. See the online demo:
#!/bin/bash
s='#green??
<????????>
action?????mysql
admin2??'
perl -CSD -Mutf8 -pe 's/(?<=[?-?\p{M}])(?=[^?-?\p{M}])|(?<=[^?-?\p{M}])(?=[?-?])/ /g' <<< "$s"
Output:
#green ??
< ???????? >
action ????? mysql
admin2 ??
The regex matches
(?<=[?-?\p{M}])(?=[^?-?\p{M}])- a location between a Devanagari letter from the[?-?]range or a diacritic mark (\p{M}) and a char other than the Devanagari letter and a diacritic mark|- or(?<=[^?-?\p{M}])(?=[?-?])- a location between a char other than the Devanagari letter and a diacritic mark and a Devanagari letter or a diacritic mark.
Solution 2:[2]
The .* matches the entire remainder of the line, and renders the g flag useless. Assuming the character class is correct (sorry, I'm unfamiliar with Devanagari) you could use
sed 's/[?-?]\+/ & /g' testfile
though you'll probably end up with some extra spaces you'll want to remove.
sed 's/[?-?]\+/ &/g;
s/^ //;s/ $//;s/ / /g' testfile
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
| Solution 2 | tripleee |
