'Using bash script to remove from sentence words longer than [x] characters

I have a sentence (array) and I would like to remove from it all words longer than 8 characters.

Example sentence:

var="one two three four giberish-giberish five giberish-giberish six"

I would like to get:

var="one two three four five six"

So far I'm using this:

echo $var | tr ' ' '\n' | awk 'length($1) <= 6 { print $1 }' | tr '\n ' ' '

Solution above works fine but as you can see I'm replacing space with newline then filtering words and then replacing back newline with space. I'm pretty sure there must be better and more "elegant" solution without swapping space/newline.



Solution 1:[1]

Using sed

$ sed 's/\<[a-z-]\{8,\}\> //g' file
var="one two three four five six"

Solution 2:[2]

Here is one way to do it:

arr=(one two three four giberish-giberish five giberish-giberish six)
for var in "${arr[@]}"; do (( ${#var} > 8 )) || echo -n "$var "; done
echo # for that newline in the end

And another:

awk '{ for(i=1;i<=NF;i++) { if(length($i) < 8) printf "%s ", $i } print "" # for that newline in the end }'

And a third!

awk -v RS='[[:space:]]+' 'length < 8 { v=v" "$0 }; END{print substr(v, 2)}'

The last one prints a "perfect" single-space delimited string with no extra leading or trailing whitespace.

Solution 3:[3]

In pure Bash, you can filter into a new array the words less than some chosen length:

#!/bin/bash

var="one two three four giberish-giberish five giberish-giberish six" 

new_arr=()
for w in $var; do  # no quotes on purpose to split string
    [[ ${#w} -lt 6 ]] && new_arr+=( "$w" )
done    

declare -p new_arr
# declare -a new_arr=([0]="one" [1]="two" [2]="three" [3]="four" [4]="five" [5]="six")

Or if the source is already an array:

old_arr=(one two three four giberish-giberish five giberish-giberish six)
new_arr=()
for w in ${old_arr[@]}; do 
    [[ ${#w} -lt 6 ]] && new_arr+=( "$w" )
done 

You may want to delete the words in old_arr as you loop over it. If you know that each $w is unique, you can do:

old_arr=(one two three four giberish-giberish five giberish-giberish six)
for w in ${old_arr[@]}; do 
    [[ ${#w} -ge 6 ]] && old_arr=("${old_arr[@]/$w}")
done 

But this has two issues: 1) If you have equal prefixes, all will be deleted and 2) The existing indices will remain:

$ declare -p old_arr
declare -a old_arr=([0]="one" [1]="two" [2]="three" [3]="four" [4]="" [5]="five" [6]="" [7]="six")

You could also unset the offending item by keeping a separate index:

old_arr=(one two three four giberish-giberish five giberish-giberish six)
idx=0
for w in ${old_arr[@]}; do 
    [[ ${#w} -ge 6 ]] && unset 'old_arr[idx]'
    (( idx++ ))
done 

But then you end up with discontinuous array indexes (but the existing qualifying words remain at the same index):

$ declare -p old_arr
declare -a old_arr=([0]="one" [1]="two" [2]="three" [3]="four" [5]="five" [7]="six")

It usually better to filter into a new array unless you want to keep the existing indexes.

Solution 4:[4]

This might work for you (GNU sed):

<<<"$var" sed -E 'y/ /\n/;s/..{8}.*\n//mg;y/\n/ /'

Translate spaces to newlines.

Remove all lines that are more than 8 characters long.

Translate newlines to spaces.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HatLess
Solution 2
Solution 3
Solution 4 potong