'Multiple process curl command for urls to output to individual files
I am attempting to curl multiple urls in a bash command. Eventually I will be curling a large number of Urls so I am using xargs to use multiple processes to speed up the process. My file consists of x number of URLs:
https://someurl.com
https://someotherurl.com
My issue comes when attempting to output the results to separate files named after the URLs I curl. The bash command I have is:
xargs -P 5 -n 1 -I% curl -k -L % -0 % < urls.txt
When I run this I get 'Failed to create file https://someotherurl.com'
Solution 1:[1]
You cannot create a file with /
in the filename. You could do it this way:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename="${line#https://}"
echo "FILENAME: $filename"
# YOUR CURL COMMAND HERE, USING $filename
fi
done < url.txt
- it ignores empty lines
- variable substitution is used to remove the
https://
part of each URL - this will allow you to create the file
Note: if your URLs containt sub-directories, they must be removed as well.
Ex: you want to do https://www.exemple.com/some/sub/dir
The script I suggested here would try to create a file named "www.exemple.com/some/sub/dir". In this case, you could replace the /
with _
using tr.
The script would become:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename=$(echo "$line" | tr '/' '_')
filename2=${filename#https:__}
echo "FILENAME: $filename2"
# YOUR CURL COMMAND HERE, USING $filename2
fi
done < url.txt
Solution 2:[2]
Because your question is ambiguous, I would assume:
- You have a file
urls.txt
that contains URLs separated byLF
. - You want to download all URLs by
curl
and use each URL as its filename.
Unfortunately, that's not possible because URL contains invalid characters like slash /
. Alternatively, for this case, I would suggest you use Bsse64 safe mode to decode URL before saving to file based on RFC 3548.
After applying this requirement, your script would become like:
seq 100 | xargs -I@ echo 'https://example.com?@' > urls.txt
xargs -P0 -L1 sh -c 'curl -SskL0 -o $(printf %s "$1" | uuencode -m /dev/stdout | sed "1d;\$d" | tr +/ -_) "$1"' sh < urls.txt
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nic3500 |
Solution 2 | Weihang Jian |