'Using bash script to remove from sentence words longer than [x] characters
I have a sentence (array) and I would like to remove from it all words longer than 8 characters.
Example sentence:
var="one two three four giberish-giberish five giberish-giberish six"
I would like to get:
var="one two three four five six"
So far I'm using this:
echo $var | tr ' ' '\n' | awk 'length($1) <= 6 { print $1 }' | tr '\n ' ' '
Solution above works fine but as you can see I'm replacing space with newline then filtering words and then replacing back newline with space. I'm pretty sure there must be better and more "elegant" solution without swapping space/newline.
Solution 1:[1]
Using sed
$ sed 's/\<[a-z-]\{8,\}\> //g' file
var="one two three four five six"
Solution 2:[2]
Here is one way to do it:
arr=(one two three four giberish-giberish five giberish-giberish six)
for var in "${arr[@]}"; do (( ${#var} > 8 )) || echo -n "$var "; done
echo # for that newline in the end
And another:
awk '{ for(i=1;i<=NF;i++) { if(length($i) < 8) printf "%s ", $i } print "" # for that newline in the end }'
And a third!
awk -v RS='[[:space:]]+' 'length < 8 { v=v" "$0 }; END{print substr(v, 2)}'
The last one prints a "perfect" single-space delimited string with no extra leading or trailing whitespace.
Solution 3:[3]
In pure Bash, you can filter into a new array the words less than some chosen length:
#!/bin/bash
var="one two three four giberish-giberish five giberish-giberish six"
new_arr=()
for w in $var; do # no quotes on purpose to split string
[[ ${#w} -lt 6 ]] && new_arr+=( "$w" )
done
declare -p new_arr
# declare -a new_arr=([0]="one" [1]="two" [2]="three" [3]="four" [4]="five" [5]="six")
Or if the source is already an array:
old_arr=(one two three four giberish-giberish five giberish-giberish six)
new_arr=()
for w in ${old_arr[@]}; do
[[ ${#w} -lt 6 ]] && new_arr+=( "$w" )
done
You may want to delete the words in old_arr as you loop over it. If you know that each $w is unique, you can do:
old_arr=(one two three four giberish-giberish five giberish-giberish six)
for w in ${old_arr[@]}; do
[[ ${#w} -ge 6 ]] && old_arr=("${old_arr[@]/$w}")
done
But this has two issues: 1) If you have equal prefixes, all will be deleted and 2) The existing indices will remain:
$ declare -p old_arr
declare -a old_arr=([0]="one" [1]="two" [2]="three" [3]="four" [4]="" [5]="five" [6]="" [7]="six")
You could also unset the offending item by keeping a separate index:
old_arr=(one two three four giberish-giberish five giberish-giberish six)
idx=0
for w in ${old_arr[@]}; do
[[ ${#w} -ge 6 ]] && unset 'old_arr[idx]'
(( idx++ ))
done
But then you end up with discontinuous array indexes (but the existing qualifying words remain at the same index):
$ declare -p old_arr
declare -a old_arr=([0]="one" [1]="two" [2]="three" [3]="four" [5]="five" [7]="six")
It usually better to filter into a new array unless you want to keep the existing indexes.
Solution 4:[4]
This might work for you (GNU sed):
<<<"$var" sed -E 'y/ /\n/;s/..{8}.*\n//mg;y/\n/ /'
Translate spaces to newlines.
Remove all lines that are more than 8 characters long.
Translate newlines to spaces.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | HatLess |
| Solution 2 | |
| Solution 3 | |
| Solution 4 | potong |
