'Remove extra whitespace between string parts in POSIX shell

Example:

  1. Input string: 'a string with "some extra space"'
  2. Desired output: 'a string with "some extra space"'

How do I get from 1. to 2. without using bash-specifics? sed, awk and all standard utilities are available – even python3 if need be. Problems:

  • string delimiters might be flipped and/or nested – anything between a set of arbitrary quotes should be considered a string and left untouched
  • piping through tr -s ' ' removes the spaces in the last string
  • xargs removes string delimiters (such as " in the example)
  • echo $VAR ignores string boundaries entirely


Solution 1:[1]

Going the Python route:

#!/usr/bin/env python3
import re, shlex, sys

words = shlex.split(sys.argv[1])
print(' '.join([ f'"{word}"' if re.search(r'\s', word) else word for word in words ]))

...wrapped for use from a shell:

remove_unquoted_spaces() {
  python3 -c '
import re, shlex, sys
words = shlex.split(sys.argv[1])
print(" ".join([ f"\"{word}\"" if re.search(r"\s", word) else word for word in words ]))
' "$@"
}
 
remove_unquoted_spaces 'a    string   with    "some   extra space"'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Charles Duffy