'How to properly pass filenames with spaces with $* from xargs to sed via sh?

Disclaimer: this happens on macOS (Big Sur); more info about the context below.

I have to write (almost did) a script which will replace images URLs in big text (xml) files by their Base64-encoded value.

The script should run the same way with single filenames or patterns, or both, e.g.:

./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml

Note: it should properly handle files\ with\ spaces.xml

So I ended up with this script:

#!/bin/bash

#needed for `ls` command
IFS=$'\n'

ls -1 $* | xargs -I % sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' % | xargs -tI % sh -c 'sed -i "" "s@%@`curl -s % | base64`@" $0' "$*"

What it does: ls all files, pipe the list to xargs then search all URLs surrounded by anchors (hence the > and < in the search expr. - also had to use sed because grep is limited on macOS), then pipe again to a sh script which runs the sed search & replace, where the remplacement is the big Base64 string.

This works perfectly fine... but only for fileswithoutspaces.xml

I tried to play with $0 vs $1, $* vs $@, w/ or w/o " but to no avail.

I don't understand exactly how does the variable substitution (is it how it's called? - not a native English speaker, and above all, not a script-writer at all!!! just a Java dev. all day long...) work between xargs, sh or even bash with arguments like filenames.

The xargs -t is here to let me check out how the substitution works, and that's how I noticed that using a pattern worked but I have to let the " around the last $*, otherwise only the 1st file is searched & replaced; output is like:

user@host % ./replace-encode pattern*.xml
sh -c sed -i "" "s@https://www.some.com/public/123456.jpg@`curl -s https://www.some.com/public/123456.jpg | base64`@" $0 pattern_123.xml
pattern_456.xml

Both pattern_123.xml and pattern_456.xml are handled here; w/ $* instead of "$*" in the end of the command, only pattern_123.xml is handled.

So is there a simple way to "fix" this?

Thank you.

Note: macOS commands have some limitations (I know) but as this script is intended to non-technical users, I can't ask them to install (or have the IT team installed on their behalf) some alternate GNU-versions installed e.g. pcregrep or 'ggrep' like I've read many times...

Also: I don't intend to change from xargs to for loops or so because, 1/ don't have the time, 2/ might want to optimize the 2nd step where some URLs might be duplicate or so.



Solution 1:[1]

Finally ended up with this single-line script:

sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$@" | xargs -I% sh -c 'sed -i "" "s@%@`curl -s % | base64`@" "$@"' _ "$@"

which does properly support filenames with or without spaces.

Solution 2:[2]

There's no reason for your software to use ls or xargs, and certainly not $*.

./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml

...will all work fine with:

#!/usr/bin/env bash
while IFS= read -r line; do
  replacement=$(curl -s "$line" | base64)
  in="$line" out="$replacement" perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' "$@"
done < <(sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$@" | sort | uniq)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 maxxyme
Solution 2